Opened 8 years ago

Closed 7 years ago

#2939 closed defect (fixed)

.values(*fields).distinct().count() returns the wrong value

Reported by: anonymous Owned by: nobody
Component: Database layer (models, ORM) Version:
Severity: normal Keywords: qs-rf-fixed
Cc: andrew@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

Post.objects.values('author')

[{'author': 1}, {'author': 2}, {'author': 1}, {'author': 2}, {'author': 1}]

Post.objects.values('author').distinct()

[{'author': 1}, {'author': 2}]

Post.objects.values('author').distinct().count()

5

len(Post.objects.values('author').distinct())

2

Attachments (6)

django-distinct-values-count.diff (1.6 KB) - added by andrew@… 8 years ago.
patch to add count method to ValuesQuerySet, better handling distinct case
distinct-values-count.patch (2.8 KB) - added by SmileyChris 8 years ago.
distinct-values-count-v2.patch (2.9 KB) - added by dstn@… 7 years ago.
Small updates to make it apply to current SVN trunk.
distinct-values-count-v3.2.patch (3.2 KB) - added by dstn@… 7 years ago.
distinct-values-count-v3.patch (3.2 KB) - added by dstn@… 7 years ago.
distinct-values-count-v4.patch (3.4 KB) - added by dstn@… 7 years ago.
Add a test and make it pass - extra() + values() + distinct() + count()

Download all attachments as: .zip

Change History (24)

comment:1 Changed 8 years ago by ubernostrum

  • priority changed from high to normal

comment:2 Changed 8 years ago by mtredinnick

  • Owner changed from adrian to mtredinnick

Changed 8 years ago by andrew@…

patch to add count method to ValuesQuerySet, better handling distinct case

comment:3 Changed 8 years ago by andrew@…

  • Cc andrew@… added
  • Summary changed from Post.objects.values('author').distinct().count() return wrong value to [patch] Post.objects.values('author').distinct().count() return wrong value

The problem is that the count method on the QuerySet class is not aware of the _fields field of the ValuesQuerySet subclass. As such, it does the same thing as if values() was not used at all.

The attached patch addresses the problem by copying-and-specializing the count method from QuerySet to ValuesQuerySet. If no _fields are present (in the distinct case), the primary key is used as in the original QuerySet count method. The patch is somewhat naive; it may be preferable to refactor the QuerySet count method, but I'm not sure what the preferred style would be for that. Feel free to provide direction.

comment:4 Changed 8 years ago by Simon G. <dev@…>

  • Triage Stage changed from Unreviewed to Design decision needed

comment:5 Changed 8 years ago by mtredinnick

  • Needs tests set
  • Triage Stage changed from Design decision needed to Accepted

I like the approach in this patch. Needs tests, though, because (a) it's easy to do, (b) it'd be nice if it stays working and (c) I'm not 100% sure from eye-balling it that it will work with SQLite.

Changed 8 years ago by SmileyChris

comment:6 Changed 8 years ago by SmileyChris

  • Needs tests unset
  • Summary changed from [patch] Post.objects.values('author').distinct().count() return wrong value to .values(*fields).distinct().count() returns the wrong value
  • Triage Stage changed from Accepted to Ready for checkin
  • Version 0.95 deleted

Patch refactored, tests added (and it works in SQLite)

comment:7 follow-up: Changed 8 years ago by mtredinnick

Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version. SQLite has changed it's distinct and count handling over the past couple of years, so this kind of thing is (possibly) version specific. I'm not expecting you to test every version, but if you can note which version you did use, I can probably troll changelogs to see if I'm high or not.

comment:8 Changed 8 years ago by SmileyChris

Well current count() code uses DISTINCT([pk_field]) already, so I'm not quite sure what you're looking for which won't work (I'm guessing multiple fields in DISTINCT?)

... but I'm using SQLite version 3.3.17

comment:9 Changed 8 years ago by mtredinnick

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

Patch fails with PostgreSQL. It's an PostgreSQL SQL syntax issue, not a Python-DB issue, so either version of psycopg shows it. The "lookup" test is the failing one. Use the postgresql backend and you'll see the SQL it produces (postgresql_psycopg2 doesn't report the bad SQL, it seems).

If possible, you could remove any ordering colummns that are not in the distinct columns set and that would fix it. Not sure if PostgreSQL is really doing the right thing here (it could discard the spurious ordering statements itself), but it's irrelevant for our purposes; the patch causes a crash.

comment:10 Changed 8 years ago by mtredinnick

Oh ... by the way .. you'll need to global replace backend.quote_name with connection.ops.quote-name after applying the patch (and fixing the reject). I should have attached the fixed patch, but I deleted my working branch before i remembered. Sorry 'bout that. :(

comment:11 Changed 8 years ago by SmileyChris

So actually my tests uncovered a totally separate bug (since it's failing on code which I didn't touch in this patch) - gah!

comment:12 in reply to: ↑ 7 Changed 8 years ago by SmileyChris

Replying to mtredinnick:

Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version.

By the way, #1878 which covers the count+distinct+SQLite bug - so don't let that influence this patch.

comment:13 Changed 8 years ago by mtredinnick

  • Keywords qs-rf added

comment:14 Changed 7 years ago by mtredinnick

  • Keywords qs-rf-fixed added; qs-rf removed

Changed 7 years ago by dstn@…

Small updates to make it apply to current SVN trunk.

comment:15 Changed 7 years ago by dstn@…

I get the following doctest failure with the v2 patch I attached above:

F
======================================================================
FAIL: Doctest: modeltests.lookup.models.__test__.API_TESTS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for modeltests.lookup.models.__test__.API_TESTS
  File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line unknown line number, in API_TESTS

----------------------------------------------------------------------
File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line ?, in modeltests.lookup.models.__test__.API_TESTS
Failed example:
    len(Article.objects.values('pub_date').distinct())
Exception raised:
    Traceback (most recent call last):
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.lookup.models.__test__.API_TESTS[49]>", line 1, in ?
        len(Article.objects.values('pub_date').distinct())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 111, in __len__
        return len(self._get_data())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 490, in _get_data
        self._result_cache = list(self.iterator())
      File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 636, in iterator
        cursor.execute("SELECT " + (self._distinct and "DISTINCT " or "") + ",".join(select) + sql, params)
    ProgrammingError: for SELECT DISTINCT, ORDER BY expressions must appear in select list



----------------------------------------------------------------------
Ran 1 test in 0.073s

FAILED (failures=1)

This seems to occur because the _meta has a default ordering, so an ORDER BY clause gets added to the SQL, but it includes fields that aren't included in the SELECT DISTINCT.

This is when using postgresql_psycopg2.

A small patch fixes this...

Changed 7 years ago by dstn@…

Changed 7 years ago by dstn@…

comment:16 Changed 7 years ago by dstn@…

Sorry for the double-attachment above... v3 fixes the previously mentioned bug.

Changed 7 years ago by dstn@…

Add a test and make it pass - extra() + values() + distinct() + count()

comment:17 Changed 7 years ago by dstn@…

The test I added might not be a good idea - it makes assumptions about how Django decides to name the database columns. Also I didn't test with any database backend other than Postgres.

comment:18 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [7477]) Merged the queryset-refactor branch into trunk.

This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.

Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658

Note: See TracTickets for help on using tickets.
Back to Top