Opened 10 years ago

Closed 8 years ago

#2939 closed defect (fixed)

.values(*fields).distinct().count() returns the wrong value

Reported by: anonymous Owned by: nobody
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: qs-rf-fixed
Cc: andrew@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

Post.objects.values('author')

[{'author': 1}, {'author': 2}, {'author': 1}, {'author': 2}, {'author': 1}]

Post.objects.values('author').distinct()

[{'author': 1}, {'author': 2}]

Post.objects.values('author').distinct().count()

5

len(Post.objects.values('author').distinct())

2

Attachments (6)

django-distinct-values-count.diff (1.6 KB) - added by andrew@… 10 years ago.
patch to add count method to ValuesQuerySet, better handling distinct case
distinct-values-count.patch (2.8 KB) - added by Chris Beaven 9 years ago.
distinct-values-count-v2.patch (2.9 KB) - added by dstn@… 8 years ago.
Small updates to make it apply to current SVN trunk.
distinct-values-count-v3.2.patch (3.2 KB) - added by dstn@… 8 years ago.
distinct-values-count-v3.patch (3.2 KB) - added by dstn@… 8 years ago.
distinct-values-count-v4.patch (3.4 KB) - added by dstn@… 8 years ago.
Add a test and make it pass - extra() + values() + distinct() + count()

Download all attachments as: .zip

Change History (24)

comment:1 Changed 10 years ago by James Bennett

priority: highnormal

comment:2 Changed 10 years ago by Malcolm Tredinnick

Owner: changed from Adrian Holovaty to Malcolm Tredinnick

Changed 10 years ago by andrew@…

patch to add count method to ValuesQuerySet, better handling distinct case

comment:3 Changed 10 years ago by andrew@…

Cc: andrew@… added
Summary: Post.objects.values('author').distinct().count() return wrong value[patch] Post.objects.values('author').distinct().count() return wrong value

The problem is that the count method on the QuerySet class is not aware of the _fields field of the ValuesQuerySet subclass. As such, it does the same thing as if values() was not used at all.

The attached patch addresses the problem by copying-and-specializing the count method from QuerySet to ValuesQuerySet. If no _fields are present (in the distinct case), the primary key is used as in the original QuerySet count method. The patch is somewhat naive; it may be preferable to refactor the QuerySet count method, but I'm not sure what the preferred style would be for that. Feel free to provide direction.

comment:4 Changed 10 years ago by Simon G. <dev@…>

Triage Stage: UnreviewedDesign decision needed

comment:5 Changed 10 years ago by Malcolm Tredinnick

Needs tests: set
Triage Stage: Design decision neededAccepted

I like the approach in this patch. Needs tests, though, because (a) it's easy to do, (b) it'd be nice if it stays working and (c) I'm not 100% sure from eye-balling it that it will work with SQLite.

Changed 9 years ago by Chris Beaven

Attachment: distinct-values-count.patch added

comment:6 Changed 9 years ago by Chris Beaven

Needs tests: unset
Summary: [patch] Post.objects.values('author').distinct().count() return wrong value.values(*fields).distinct().count() returns the wrong value
Triage Stage: AcceptedReady for checkin
Version: 0.95

Patch refactored, tests added (and it works in SQLite)

comment:7 Changed 9 years ago by Malcolm Tredinnick

Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version. SQLite has changed it's distinct and count handling over the past couple of years, so this kind of thing is (possibly) version specific. I'm not expecting you to test every version, but if you can note which version you did use, I can probably troll changelogs to see if I'm high or not.

comment:8 Changed 9 years ago by Chris Beaven

Well current count() code uses DISTINCT([pk_field]) already, so I'm not quite sure what you're looking for which won't work (I'm guessing multiple fields in DISTINCT?)

... but I'm using SQLite version 3.3.17

comment:9 Changed 9 years ago by Malcolm Tredinnick

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Patch fails with PostgreSQL. It's an PostgreSQL SQL syntax issue, not a Python-DB issue, so either version of psycopg shows it. The "lookup" test is the failing one. Use the postgresql backend and you'll see the SQL it produces (postgresql_psycopg2 doesn't report the bad SQL, it seems).

If possible, you could remove any ordering colummns that are not in the distinct columns set and that would fix it. Not sure if PostgreSQL is really doing the right thing here (it could discard the spurious ordering statements itself), but it's irrelevant for our purposes; the patch causes a crash.

comment:10 Changed 9 years ago by Malcolm Tredinnick

Oh ... by the way .. you'll need to global replace backend.quote_name with connection.ops.quote-name after applying the patch (and fixing the reject). I should have attached the fixed patch, but I deleted my working branch before i remembered. Sorry 'bout that. :(

comment:11 Changed 9 years ago by Chris Beaven

So actually my tests uncovered a totally separate bug (since it's failing on code which I didn't touch in this patch) - gah!

comment:12 in reply to:  7 Changed 9 years ago by Chris Beaven

Replying to mtredinnick:

Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version.

By the way, #1878 which covers the count+distinct+SQLite bug - so don't let that influence this patch.

comment:13 Changed 9 years ago by Malcolm Tredinnick

Keywords: qs-rf added

comment:14 Changed 9 years ago by Malcolm Tredinnick

Keywords: qs-rf-fixed added; qs-rf removed

Changed 8 years ago by dstn@…

Small updates to make it apply to current SVN trunk.

comment:15 Changed 8 years ago by dstn@…

I get the following doctest failure with the v2 patch I attached above:

F
======================================================================
FAIL: Doctest: modeltests.lookup.models.__test__.API_TESTS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for modeltests.lookup.models.__test__.API_TESTS
  File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line unknown line number, in API_TESTS

----------------------------------------------------------------------
File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line ?, in modeltests.lookup.models.__test__.API_TESTS
Failed example:
    len(Article.objects.values('pub_date').distinct())
Exception raised:
    Traceback (most recent call last):
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.lookup.models.__test__.API_TESTS[49]>", line 1, in ?
        len(Article.objects.values('pub_date').distinct())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 111, in __len__
        return len(self._get_data())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 490, in _get_data
        self._result_cache = list(self.iterator())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 636, in iterator
        cursor.execute("SELECT " + (self._distinct and "DISTINCT " or "") + ",".join(select) + sql, params)
    ProgrammingError: for SELECT DISTINCT, ORDER BY expressions must appear in select list



----------------------------------------------------------------------
Ran 1 test in 0.073s

FAILED (failures=1)

This seems to occur because the _meta has a default ordering, so an ORDER BY clause gets added to the SQL, but it includes fields that aren't included in the SELECT DISTINCT.

This is when using postgresql_psycopg2.

A small patch fixes this...

Changed 8 years ago by dstn@…

Changed 8 years ago by dstn@…

comment:16 Changed 8 years ago by dstn@…

Sorry for the double-attachment above... v3 fixes the previously mentioned bug.

Changed 8 years ago by dstn@…

Add a test and make it pass - extra() + values() + distinct() + count()

comment:17 Changed 8 years ago by dstn@…

The test I added might not be a good idea - it makes assumptions about how Django decides to name the database columns. Also I didn't test with any database backend other than Postgres.

comment:18 Changed 8 years ago by Malcolm Tredinnick

Resolution: fixed
Status: newclosed

(In [7477]) Merged the queryset-refactor branch into trunk.

This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.

Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658

Note: See TracTickets for help on using tickets.
Back to Top