Opened 18 years ago

Closed 16 years ago

#2939 closed defect (fixed)

.values(*fields).distinct().count() returns the wrong value

Reported by: anonymous Owned by: nobody
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: qs-rf-fixed
Cc: andrew@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Post.objects.values('author')

[{'author': 1}, {'author': 2}, {'author': 1}, {'author': 2}, {'author': 1}]

Post.objects.values('author').distinct()

[{'author': 1}, {'author': 2}]

Post.objects.values('author').distinct().count()

5

len(Post.objects.values('author').distinct())

2

Attachments (6)

django-distinct-values-count.diff (1.6 KB ) - added by andrew@… 17 years ago.
patch to add count method to ValuesQuerySet, better handling distinct case
distinct-values-count.patch (2.8 KB ) - added by Chris Beaven 17 years ago.
distinct-values-count-v2.patch (2.9 KB ) - added by dstn@… 16 years ago.
Small updates to make it apply to current SVN trunk.
distinct-values-count-v3.2.patch (3.2 KB ) - added by dstn@… 16 years ago.
distinct-values-count-v3.patch (3.2 KB ) - added by dstn@… 16 years ago.
distinct-values-count-v4.patch (3.4 KB ) - added by dstn@… 16 years ago.
Add a test and make it pass - extra() + values() + distinct() + count()

Download all attachments as: .zip

Change History (24)

comment:1 by James Bennett, 18 years ago

priority: highnormal

comment:2 by Malcolm Tredinnick, 18 years ago

Owner: changed from Adrian Holovaty to Malcolm Tredinnick

by andrew@…, 17 years ago

patch to add count method to ValuesQuerySet, better handling distinct case

comment:3 by andrew@…, 17 years ago

Cc: andrew@… added
Summary: Post.objects.values('author').distinct().count() return wrong value[patch] Post.objects.values('author').distinct().count() return wrong value

The problem is that the count method on the QuerySet class is not aware of the _fields field of the ValuesQuerySet subclass. As such, it does the same thing as if values() was not used at all.

The attached patch addresses the problem by copying-and-specializing the count method from QuerySet to ValuesQuerySet. If no _fields are present (in the distinct case), the primary key is used as in the original QuerySet count method. The patch is somewhat naive; it may be preferable to refactor the QuerySet count method, but I'm not sure what the preferred style would be for that. Feel free to provide direction.

comment:4 by Simon G. <dev@…>, 17 years ago

Triage Stage: UnreviewedDesign decision needed

comment:5 by Malcolm Tredinnick, 17 years ago

Needs tests: set
Triage Stage: Design decision neededAccepted

I like the approach in this patch. Needs tests, though, because (a) it's easy to do, (b) it'd be nice if it stays working and (c) I'm not 100% sure from eye-balling it that it will work with SQLite.

by Chris Beaven, 17 years ago

Attachment: distinct-values-count.patch added

comment:6 by Chris Beaven, 17 years ago

Needs tests: unset
Summary: [patch] Post.objects.values('author').distinct().count() return wrong value.values(*fields).distinct().count() returns the wrong value
Triage Stage: AcceptedReady for checkin
Version: 0.95

Patch refactored, tests added (and it works in SQLite)

comment:7 by Malcolm Tredinnick, 17 years ago

Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version. SQLite has changed it's distinct and count handling over the past couple of years, so this kind of thing is (possibly) version specific. I'm not expecting you to test every version, but if you can note which version you did use, I can probably troll changelogs to see if I'm high or not.

comment:8 by Chris Beaven, 17 years ago

Well current count() code uses DISTINCT([pk_field]) already, so I'm not quite sure what you're looking for which won't work (I'm guessing multiple fields in DISTINCT?)

... but I'm using SQLite version 3.3.17

comment:9 by Malcolm Tredinnick, 17 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Patch fails with PostgreSQL. It's an PostgreSQL SQL syntax issue, not a Python-DB issue, so either version of psycopg shows it. The "lookup" test is the failing one. Use the postgresql backend and you'll see the SQL it produces (postgresql_psycopg2 doesn't report the bad SQL, it seems).

If possible, you could remove any ordering colummns that are not in the distinct columns set and that would fix it. Not sure if PostgreSQL is really doing the right thing here (it could discard the spurious ordering statements itself), but it's irrelevant for our purposes; the patch causes a crash.

comment:10 by Malcolm Tredinnick, 17 years ago

Oh ... by the way .. you'll need to global replace backend.quote_name with connection.ops.quote-name after applying the patch (and fixing the reject). I should have attached the fixed patch, but I deleted my working branch before i remembered. Sorry 'bout that. :(

comment:11 by Chris Beaven, 17 years ago

So actually my tests uncovered a totally separate bug (since it's failing on code which I didn't touch in this patch) - gah!

in reply to:  7 comment:12 by Chris Beaven, 17 years ago

Replying to mtredinnick:

Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version.

By the way, #1878 which covers the count+distinct+SQLite bug - so don't let that influence this patch.

comment:13 by Malcolm Tredinnick, 17 years ago

Keywords: qs-rf added

comment:14 by Malcolm Tredinnick, 17 years ago

Keywords: qs-rf-fixed added; qs-rf removed

by dstn@…, 16 years ago

Small updates to make it apply to current SVN trunk.

comment:15 by dstn@…, 16 years ago

I get the following doctest failure with the v2 patch I attached above:

F
======================================================================
FAIL: Doctest: modeltests.lookup.models.__test__.API_TESTS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for modeltests.lookup.models.__test__.API_TESTS
  File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line unknown line number, in API_TESTS

----------------------------------------------------------------------
File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line ?, in modeltests.lookup.models.__test__.API_TESTS
Failed example:
    len(Article.objects.values('pub_date').distinct())
Exception raised:
    Traceback (most recent call last):
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.lookup.models.__test__.API_TESTS[49]>", line 1, in ?
        len(Article.objects.values('pub_date').distinct())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 111, in __len__
        return len(self._get_data())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 490, in _get_data
        self._result_cache = list(self.iterator())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 636, in iterator
        cursor.execute("SELECT " + (self._distinct and "DISTINCT " or "") + ",".join(select) + sql, params)
    ProgrammingError: for SELECT DISTINCT, ORDER BY expressions must appear in select list



----------------------------------------------------------------------
Ran 1 test in 0.073s

FAILED (failures=1)

This seems to occur because the _meta has a default ordering, so an ORDER BY clause gets added to the SQL, but it includes fields that aren't included in the SELECT DISTINCT.

This is when using postgresql_psycopg2.

A small patch fixes this...

by dstn@…, 16 years ago

by dstn@…, 16 years ago

comment:16 by dstn@…, 16 years ago

Sorry for the double-attachment above... v3 fixes the previously mentioned bug.

by dstn@…, 16 years ago

Add a test and make it pass - extra() + values() + distinct() + count()

comment:17 by dstn@…, 16 years ago

The test I added might not be a good idea - it makes assumptions about how Django decides to name the database columns. Also I didn't test with any database backend other than Postgres.

comment:18 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: newclosed

(In [7477]) Merged the queryset-refactor branch into trunk.

This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.

Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658

Note: See TracTickets for help on using tickets.
Back to Top