Opened 18 years ago
Closed 17 years ago
#2939 closed defect (fixed)
.values(*fields).distinct().count() returns the wrong value
Reported by: | anonymous | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | normal | Keywords: | qs-rf-fixed |
Cc: | andrew@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Post.objects.values('author')
[{'author': 1}, {'author': 2}, {'author': 1}, {'author': 2}, {'author': 1}]
Post.objects.values('author').distinct()
[{'author': 1}, {'author': 2}]
Post.objects.values('author').distinct().count()
5
len(Post.objects.values('author').distinct())
2
Attachments (6)
Change History (24)
comment:1 by , 18 years ago
priority: | high → normal |
---|
comment:2 by , 18 years ago
Owner: | changed from | to
---|
by , 18 years ago
Attachment: | django-distinct-values-count.diff added |
---|
comment:3 by , 18 years ago
Cc: | added |
---|---|
Summary: | Post.objects.values('author').distinct().count() return wrong value → [patch] Post.objects.values('author').distinct().count() return wrong value |
The problem is that the count method on the QuerySet class is not aware of the _fields field of the ValuesQuerySet subclass. As such, it does the same thing as if values() was not used at all.
The attached patch addresses the problem by copying-and-specializing the count method from QuerySet to ValuesQuerySet. If no _fields are present (in the distinct case), the primary key is used as in the original QuerySet count method. The patch is somewhat naive; it may be preferable to refactor the QuerySet count method, but I'm not sure what the preferred style would be for that. Feel free to provide direction.
comment:4 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:5 by , 18 years ago
Needs tests: | set |
---|---|
Triage Stage: | Design decision needed → Accepted |
I like the approach in this patch. Needs tests, though, because (a) it's easy to do, (b) it'd be nice if it stays working and (c) I'm not 100% sure from eye-balling it that it will work with SQLite.
by , 17 years ago
Attachment: | distinct-values-count.patch added |
---|
comment:6 by , 17 years ago
Needs tests: | unset |
---|---|
Summary: | [patch] Post.objects.values('author').distinct().count() return wrong value → .values(*fields).distinct().count() returns the wrong value |
Triage Stage: | Accepted → Ready for checkin |
Version: | 0.95 |
Patch refactored, tests added (and it works in SQLite)
follow-up: 12 comment:7 by , 17 years ago
Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version. SQLite has changed it's distinct and count handling over the past couple of years, so this kind of thing is (possibly) version specific. I'm not expecting you to test every version, but if you can note which version you did use, I can probably troll changelogs to see if I'm high or not.
comment:8 by , 17 years ago
Well current count()
code uses DISTINCT([pk_field])
already, so I'm not quite sure what you're looking for which won't work (I'm guessing multiple fields in DISTINCT
?)
... but I'm using SQLite version 3.3.17
comment:9 by , 17 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Patch fails with PostgreSQL. It's an PostgreSQL SQL syntax issue, not a Python-DB issue, so either version of psycopg shows it. The "lookup" test is the failing one. Use the postgresql backend and you'll see the SQL it produces (postgresql_psycopg2 doesn't report the bad SQL, it seems).
If possible, you could remove any ordering colummns that are not in the distinct columns set and that would fix it. Not sure if PostgreSQL is really doing the right thing here (it could discard the spurious ordering statements itself), but it's irrelevant for our purposes; the patch causes a crash.
comment:10 by , 17 years ago
Oh ... by the way .. you'll need to global replace backend.quote_name with connection.ops.quote-name after applying the patch (and fixing the reject). I should have attached the fixed patch, but I deleted my working branch before i remembered. Sorry 'bout that. :(
comment:11 by , 17 years ago
So actually my tests uncovered a totally separate bug (since it's failing on code which I didn't touch in this patch) - gah!
comment:12 by , 17 years ago
Replying to mtredinnick:
Chris: It works with which version of SQLite? This isn't a trivial question, but rather something that might affect our supported minimum version.
By the way, #1878 which covers the count+distinct+SQLite bug - so don't let that influence this patch.
comment:13 by , 17 years ago
Keywords: | qs-rf added |
---|
comment:14 by , 17 years ago
Keywords: | qs-rf-fixed added; qs-rf removed |
---|
by , 17 years ago
Attachment: | distinct-values-count-v2.patch added |
---|
Small updates to make it apply to current SVN trunk.
comment:15 by , 17 years ago
I get the following doctest failure with the v2 patch I attached above:
F ====================================================================== FAIL: Doctest: modeltests.lookup.models.__test__.API_TESTS ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 2180, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for modeltests.lookup.models.__test__.API_TESTS File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line unknown line number, in API_TESTS ---------------------------------------------------------------------- File "/home/gmaps/django-svn/tests/modeltests/lookup/models.py", line ?, in modeltests.lookup.models.__test__.API_TESTS Failed example: len(Article.objects.values('pub_date').distinct()) Exception raised: Traceback (most recent call last): File "/home/gmaps/django-test//lib/python2.4/site-packages/django/test/_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest modeltests.lookup.models.__test__.API_TESTS[49]>", line 1, in ? len(Article.objects.values('pub_date').distinct()) File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 111, in __len__ return len(self._get_data()) File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 490, in _get_data self._result_cache = list(self.iterator()) File "/home/gmaps/django-test//lib/python2.4/site-packages/django/db/models/query.py", line 636, in iterator cursor.execute("SELECT " + (self._distinct and "DISTINCT " or "") + ",".join(select) + sql, params) ProgrammingError: for SELECT DISTINCT, ORDER BY expressions must appear in select list ---------------------------------------------------------------------- Ran 1 test in 0.073s FAILED (failures=1)
This seems to occur because the _meta has a default ordering, so an ORDER BY clause gets added to the SQL, but it includes fields that aren't included in the SELECT DISTINCT.
This is when using postgresql_psycopg2.
A small patch fixes this...
by , 17 years ago
Attachment: | distinct-values-count-v3.2.patch added |
---|
by , 17 years ago
Attachment: | distinct-values-count-v3.patch added |
---|
comment:16 by , 17 years ago
Sorry for the double-attachment above... v3 fixes the previously mentioned bug.
by , 17 years ago
Attachment: | distinct-values-count-v4.patch added |
---|
Add a test and make it pass - extra() + values() + distinct() + count()
comment:17 by , 17 years ago
The test I added might not be a good idea - it makes assumptions about how Django decides to name the database columns. Also I didn't test with any database backend other than Postgres.
comment:18 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [7477]) Merged the queryset-refactor branch into trunk.
This is a big internal change, but mostly backwards compatible with existing
code. Also adds a couple of new features.
Fixed #245, #1050, #1656, #1801, #2076, #2091, #2150, #2253, #2306, #2400, #2430, #2482, #2496, #2676, #2737, #2874, #2902, #2939, #3037, #3141, #3288, #3440, #3592, #3739, #4088, #4260, #4289, #4306, #4358, #4464, #4510, #4858, #5012, #5020, #5261, #5295, #5321, #5324, #5325, #5555, #5707, #5796, #5817, #5987, #6018, #6074, #6088, #6154, #6177, #6180, #6203, #6658
patch to add count method to ValuesQuerySet, better handling distinct case