Opened 10 years ago

Closed 10 years ago

#24385 closed Bug (fixed)

Sum() doesn't seem to respect .distinct() in the queryset filter before .aggregate() when filtering by m2m.

Reported by: Mark Owned by: nobody
Component: Database layer (models, ORM) Version: 1.7
Severity: Normal Keywords:
Cc: Ed Henderson Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mark)

Hello,
I setup a minimal project here: https://github.com/mcagl/minimal_django_sum_test to demonstrate the problem.

I'm working with Django 1.6.x and I noticed something that I don't understand.

Edit: I checked also with the last stable version, 1.7.4 downloaded right from the website, and the problem is still there.

As you can see from the github repository, I have a Tag model and a Row model with a m2m towards Tag and a DecimalField called amount.

If I filter Row objects for more than one Tag, and there is/are Row objects that have more than one Tag among the one filtered by, Sum('amount') counts it/them once per Tag, even if I use distinct().

Please note also that I assert, in the test, that the filtered queryset is composed by three Row objects, as expected, but the next assert fails, with 40 != 30.

I added a test that instead of aggregate(Sum('amount')) does sum([x.amount for x in rows]) which passes.

Is this a bug in Sum() or am I missing something?

Kind regards,
Mark

Change History (10)

comment:1 by Mark, 10 years ago

Description: modified (diff)
Summary: Problem with aggregate(Sum())Sum() doesn't seem to respect .distinct() in the queryset filter before .aggregate() when filtering by m2m.
Version: 1.61.7

comment:2 by Josh Smeaton, 10 years ago

Is this a symptom of the same problem described by https://code.djangoproject.com/ticket/10060 ?

Version 0, edited 10 years ago by Josh Smeaton (next)

comment:3 by Mark, 10 years ago

It seems similar, or at least somewhat related.

I did a little more inspecting, doing this in ./manage.py shell to obtain the executed raw SQL statements:

import logging
l = logging.getLogger('django.db.backends')
l.setLevel(logging.DEBUG)
l.addHandler(logging.StreamHandler())
import datetime
from decimal import Decimal
import my_app.models as app_models
t0 = app_models.Tag.objects.create(name="Tag0")
t1 = app_models.Tag.objects.create(name="Tag1")
t2 = app_models.Tag.objects.create(name="Tag2")
t3 = app_models.Tag.objects.create(name="Tag3")
app_models.Row.objects.create(day=datetime.date(2014, 1, 1), amount=Decimal('10.00'))
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 2), amount=Decimal('10.00'))
r.tags.add(t0)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 3), amount=Decimal('10.00'))
r.tags.add(t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 4), amount=Decimal('10.00'))
r.tags.add(t0, t1)
r = app_models.Row.objects.create(day=datetime.date(2014, 1, 5), amount=Decimal('10.00'))
r.tags.add(t2, t3)

so mainly adding a DEBUG level logger for SQL queries, and recreating the objects in the setUp() method of the TestCase in the github repository. Then I tried the queries, obtaining this output, first without distinct():

rows = app_models.Row.objects.filter(tags__in=[t0, t1])
list(rows)
>>> (0.000) QUERY = u'SELECT "my_app_row"."id", "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> [<Row: Row object>, <Row: Row object>, <Row: Row object>, <Row: Row object>]
agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
>>> (0.000) QUERY = u'SELECT SUM("my_app_row"."amount") AS "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> 40.00

Then I added distinct(), which gets added as expected (and the rows queryset gets 3 objects instead of 4, also expected):

rows = app_models.Row.objects.filter(tags__in=[t0, t1]).distinct()
list(rows)
>>> (0.000) QUERY = u'SELECT DISTINCT "my_app_row"."id", "my_app_row"."amount", "my_app_row"."day" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> [<Row: Row object>, <Row: Row object>, <Row: Row object>]

but somewhat fails when doint aggregation in this version of the queryset, because the DISTINCT is added before SUM, whereas I expect that distinct() gets executed before doing the aggregation, as it's the python code I wrote:

agg = rows.aggregate(Sum('amount'))
print agg['amount__sum']
>>> (0.000) QUERY = u'SELECT DISTINCT SUM("my_app_row"."amount") AS "amount__sum" FROM "my_app_row" INNER JOIN "my_app_row_tags" ON ( "my_app_row"."id" = "my_app_row_tags"."row_id" ) WHERE "my_app_row_tags"."tag_id" IN (%s, %s)' - PARAMS = (1, 2); args=(1, 2)
>>> 40.00

And obviously agg['amount__sum'] contains an unexpected value, because at an higher level it's like the Row object with two tags is counted twice, completely disregarding the distinct() method I called before aggregating.

Thanks for the quick response, I hope that this can be useful,
Mark

Last edited 10 years ago by Mark (previous) (diff)

comment:4 by Tim Graham, 10 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:5 by Camillo Bruni, 10 years ago

http://www.sqlteam.com/article/how-to-use-group-by-with-distinct-aggregates-and-derived-tables describes this issue on SQL-level. Their solution was to use a derived table, something that is not directly possible with django queries, I think. The only solution I came up with was to sum up the values in a python loop, relying on distinct() to properly do its job.

comment:6 by Ed Henderson, 10 years ago

I used the test project that mcagl provided.
The test fails under stable/1.7.x but it passes on stable/1.8.x and also on master.

It would appear that this is fixed in 1.8.

comment:7 by Ed Henderson, 10 years ago

Cc: Ed Henderson added
Has patch: set

Added a pull request with a test to validate this issue and insure it does not get re-introduced.

https://github.com/django/django/pull/4507

comment:8 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

Looks good, pending some comment cleanups and squashing of commits.

comment:9 by Tim Graham <timograham@…>, 10 years ago

In 910638f:

Refs #24385 -- Added tests for distinct sum issue fixed in c7fd9b242d2d63406f1de6cc3204e35aaa025233

comment:10 by Tim Graham, 10 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top