Opened 10 years ago
Closed 10 years ago
#24385 closed Bug (fixed)
Sum() doesn't seem to respect .distinct() in the queryset filter before .aggregate() when filtering by m2m.
Reported by: | Mark | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.7 |
Severity: | Normal | Keywords: | |
Cc: | Ed Henderson | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello,
I setup a minimal project here: https://github.com/mcagl/minimal_django_sum_test to demonstrate the problem.
I'm working with Django 1.6.x and I noticed something that I don't understand.
Edit: I checked also with the last stable version, 1.7.4 downloaded right from the website, and the problem is still there.
As you can see from the github repository, I have a Tag model and a Row model with a m2m towards Tag and a DecimalField called amount.
If I filter Row objects for more than one Tag, and there is/are Row objects that have more than one Tag among the one filtered by, Sum('amount') counts it/them once per Tag, even if I use distinct().
Please note also that I assert, in the test, that the filtered queryset is composed by three Row objects, as expected, but the next assert fails, with 40 != 30.
I added a test that instead of aggregate(Sum('amount')) does sum([x.amount for x in rows]) which passes.
Is this a bug in Sum() or am I missing something?
Kind regards,
Mark
Change History (10)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|---|
Summary: | Problem with aggregate(Sum()) → Sum() doesn't seem to respect .distinct() in the queryset filter before .aggregate() when filtering by m2m. |
Version: | 1.6 → 1.7 |
comment:3 by , 10 years ago
It seems similar, or at least somewhat related.
I did a little more inspecting, doing this in ./manage.py shell
to obtain the executed raw SQL statements:
import logging l = logging.getLogger('django.db.backends') l.setLevel(logging.DEBUG) l.addHandler(logging.StreamHandler()) import datetime from decimal import Decimal import my_app.models as app_models t0 = app_models.Tag.objects.create(name="Tag0") t1 = app_models.Tag.objects.create(name="Tag1") t2 = app_models.Tag.objects.create(name="Tag2") t3 = app_models.Tag.objects.create(name="Tag3") app_models.Row.objects.create(day=datetime.date(2014, 1, 1), amount=Decimal('10.00')) r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2), amount=Decimal('10.00')) r.tags.add(t0) r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3), amount=Decimal('10.00')) r.tags.add(t1) r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4), amount=Decimal('10.00')) r.tags.add(t0, t1) r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5), amount=Decimal('10.00')) r.tags.add(t2, t3)
so mainly adding a DEBUG level logger for SQL queries, and recreating the objects in the setUp() method of the TestCase in the github repository. Then I tried the queries, obtaining this output, first without distinct()
:
rows = app_models.Row.objects.filter(tags__in=[t0, t1] list(rows)
>>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2) >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row object>]
agg = rows.aggregate(Sum('amount')) print agg['amount__sum']
>>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2) >>> 40.00
Then I added distinct()
, which gets added as expected (and the rows queryset gets 3 objects instead of 4, also expected):
rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct() list(rows)
>>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id", "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2) >>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]
but somewhat fails when doint aggregation in this version of the queryset, because the DISTINCT
is added before SUM
, whereas I expect that distinct()
gets executed before doing the aggregation, as it's the python code I wrote:
agg = rows.aggregate(Sum('amount')) print agg['amount__sum']
>>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2) >>> 40.00
And obviously agg['amount__sum']
contains an unexpected value, because at an higher level it's like the Row object with two tags is counted twice, completely disregarding the distinct()
method I called before aggregating.
Thanks for the quick response, I hope that this can be useful,
Mark
comment:4 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Bug |
comment:5 by , 10 years ago
http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-aggregates-and-derived-tables describes this issue on SQL-level. Their solution was to use a derived table, something that is not directly possible with django queries, I think. The only solution I came up with was to sum up the values in a python loop, relying on distinct()
to properly do its job.
comment:6 by , 10 years ago
I used the test project that mcagl provided.
The test fails under stable/1.7.x but it passes on stable/1.8.x and also on master.
It would appear that this is fixed in 1.8.
comment:7 by , 10 years ago
Cc: | added |
---|---|
Has patch: | set |
Added a pull request with a test to validate this issue and insure it does not get re-introduced.
comment:8 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Looks good, pending some comment cleanups and squashing of commits.
comment:10 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Is this a symptom of the same problem described by #10060 ?