Opened 11 years ago
Closed 11 years ago
#24385 closed Bug (fixed)
Sum() doesn't seem to respect .distinct() in the queryset filter before .aggregate() when filtering by m2m.
| Reported by: | Mark | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.7 |
| Severity: | Normal | Keywords: | |
| Cc: | Ed Henderson | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello,
I setup a minimal project here: https://github.com/mcagl/minimal_django_sum_test to demonstrate the problem.
I'm working with Django 1.6.x and I noticed something that I don't understand.
Edit: I checked also with the last stable version, 1.7.4 downloaded right from the website, and the problem is still there.
As you can see from the github repository, I have a Tag model and a Row model with a m2m towards Tag and a DecimalField called amount.
If I filter Row objects for more than one Tag, and there is/are Row objects that have more than one Tag among the one filtered by, Sum('amount') counts it/them once per Tag, even if I use distinct().
Please note also that I assert, in the test, that the filtered queryset is composed by three Row objects, as expected, but the next assert fails, with 40 != 30.
I added a test that instead of aggregate(Sum('amount')) does sum([x.amount for x in rows]) which passes.
Is this a bug in Sum() or am I missing something?
Kind regards,
Mark
Change History (10)
comment:1 by , 11 years ago
| Description: | modified (diff) |
|---|---|
| Summary: | Problem with aggregate(Sum()) → Sum() doesn't seem to respect .distinct() in the queryset filter before .aggregate() when filtering by m2m. |
| Version: | 1.6 → 1.7 |
comment:3 by , 11 years ago
It seems similar, or at least somewhat related.
I did a little more inspecting, doing this in ./manage.py shell to obtain the executed raw SQL statements:
import logging
l = logging.getLogger('django.db.backends')
l.setLevel(logging.DEBUG)
l.addHandler(logging.StreamHandler())
import datetime
from decimal import Decimal
import my_app.models as app_models
t0 = app_models.Tag.objects.create(name="Tag0")
t1 = app_models.Tag.objects.create(name="Tag1")
t2 = app_models.Tag.objects.create(name="Tag2")
t3 = app_models.Tag.objects.create(name="Tag3")
app_models.Row.objects.create(day=datetime.date(2014, 1, 1), amount=Decimal('10.00'))
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2), amount=Decimal('10.00'))
r.tags.add(t0)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3), amount=Decimal('10.00'))
r.tags.add(t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4), amount=Decimal('10.00'))
r.tags.add(t0, t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5), amount=Decimal('10.00'))
r.tags.add(t2, t3)
so mainly adding a DEBUG level logger for SQL queries, and recreating the objects in the setUp() method of the TestCase in the github repository. Then I tried the queries, obtaining this output, first without distinct():
rows = app_models.Row.objects.filter(tags__in=[t0, t1]) list(rows)
>>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2) >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row object>]
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
>>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> 40.00
Then I added distinct(), which gets added as expected (and the rows queryset gets 3 objects instead of 4, also expected):
rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct() list(rows)
>>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id", "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2) >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]
but somewhat fails when doint aggregation in this version of the queryset, because the DISTINCT is added before SUM, whereas I expect that distinct() gets executed before doing the aggregation, as it's the python code I wrote:
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
>>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> 40.00
And obviously agg['amount__sum'] contains an unexpected value, because at an higher level it's like the Row object with two tags is counted twice, completely disregarding the distinct() method I called before aggregating.
Thanks for the quick response, I hope that this can be useful,
Mark
comment:4 by , 11 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Type: | Uncategorized → Bug |
comment:5 by , 11 years ago
http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-aggregates-and-derived-tables describes this issue on SQL-level. Their solution was to use a derived table, something that is not directly possible with django queries, I think. The only solution I came up with was to sum up the values in a python loop, relying on distinct() to properly do its job.
comment:6 by , 11 years ago
I used the test project that mcagl provided.
The test fails under stable/1.7.x but it passes on stable/1.8.x and also on master.
It would appear that this is fixed in 1.8.
comment:7 by , 11 years ago
| Cc: | added |
|---|---|
| Has patch: | set |
Added a pull request with a test to validate this issue and insure it does not get re-introduced.
comment:8 by , 11 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
Looks good, pending some comment cleanups and squashing of commits.
comment:10 by , 11 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
Is this a symptom of the same problem described by #10060 ?