Opened 15 years ago

Closed 11 years ago

Last modified 2 years ago

#11293 closed Bug (fixed)

Filters on aggregates lose connector

Reported by: django@… Owned by: -
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords: having, where, aggregate, connector, annotate
Cc: jay.wineinger@…, gz, eallik@…, bas@…, ego@…, uwe+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When adding a filter on an aggregated column, it must be moved to the having clause instead of the where clause, but it loses the connector. Using the model in the Aggregation help topic:

>>> print Book.objects.values('publisher').annotate(pub_books=Count('id')).filter(Q(pub_books__gt=10) | Q(pub_books=1)).query.as_sql()
('SELECT `having_test_book`.`publisher_id`, COUNT(`having_test_book`.`id`) AS `pub_books` FROM `having_test_book` GROUP BY `having_test_book`.`publisher_id` HAVING (COUNT(`having_test_book`.`id`) > %s  AND COUNT(`having_test_book`.`id`) = %s ) ORDER BY NULL', (10, 1))

The HAVING clause should be an OR, not an AND. I think this happens because django.db.models.sql.query add_filter treats each filter individually.

A more complicated case that also needs to be handled is a filter that involves OR connectors as well as mixed aggregate and non-aggregate columns:

>>> print Book.objects.values('publisher').annotate(pub_books=Count('id')).filter(Q(pub_books__gt=10) | Q(pages=1)).query.as_sql()
('SELECT `having_test_book`.`publisher_id`, COUNT(`having_test_book`.`id`) AS `pub_books` FROM `having_test_book` WHERE `having_test_book`.`pages` = %s  GROUP BY `having_test_book`.`publisher_id` HAVING COUNT(`having_test_book`.`id`) > %s  ORDER BY NULL', (1, 10))

One filter ends up in WHERE, and the other in HAVING, but both need to be moved to HAVING for the connector to work.

Attachments (1)

django_11414_filter_patch.diff (1.6 KB ) - added by shadfc 15 years ago.
patch against 11414, fixes "simple" cases

Download all attachments as: .zip

Change History (29)

comment:1 by Alex Gaynor, 15 years ago

Component: UncategorizedORM aggregation
Owner: nobody removed
Triage Stage: UnreviewedAccepted

comment:2 by anonymous, 15 years ago

Cc: jay.wineinger@… added

by shadfc, 15 years ago

patch against 11414, fixes "simple" cases

comment:3 by Alex Gaynor, 15 years ago

Has patch: set
Patch needs improvement: set

This won't fix cases where one of the items isn't an aggregate though.

comment:4 by shadfc, 15 years ago

The patch I attached fixes (I think) SOME complex filtering on annotations. My use case is that I wanted to generate a sum of a field on a related model and only show records where a field on the parent is greater than the sum from the related models. But since the related model field could be null, I also had to include those into the result.

MyModel.objects.annotate(rel_qty=Sum('related__nullnumberfield').filter(Q(qty__gt=F('rel_qty')) | Q(rel_qty__isnull=True))

Before this patch, the SQL generated for the HAVING clause of the query by the two OR'd Q() objects was actually an AND. It seemed that add_filter always AND'd these instead of honoring the connector. I am not sure why that is/was, so someone enlighten me if I am missing an important fact here.

This fixes the first case in the ticket description, but not the second. I may be wrong here, but I think they may be separate issues, so perhaps this ticket can be modified to deal with the "simple" cases as I call them and another opened to deal with the rest.

comment:5 by gz, 14 years ago

Cc: gz added

comment:6 by Erik Allik, 14 years ago

Cc: eallik@… added

comment:7 by anonymous, 14 years ago

Cc: eallik@… bas@… added; eallik@… removed

comment:8 by anonymous, 14 years ago

Cc: eallik@… added; eallik@… removed

comment:9 by Paul Garner, 14 years ago

Cc: ego@… added

I've just run into this bug... unfortunately I need the second case fixed :(

End result in both cases is you get results of an AND query where you've specified an OR

I tried doing the alternate form:
qs = Book.objects.values('publisher').annotate(pub_books=Count('id'))
qs = qs.filter(pub_booksgt=10) | qs.filter(pages=1)

But this is (differently) broken too.

comment:10 by Paul Garner, 14 years ago

with formatting:

qs = Book.objects.values('publisher').annotate(pub_books=Count('id'))
qs = qs.filter(pub_books__gt=10) | qs.filter(pages=1)

comment:11 by zbyte64, 14 years ago

I've managed to hack around this in my own code, hopefully this will help someone else.

class PassThroughAggregate(object):
    def __init__(self, field):
        self.field = field

    def as_sql(self, quote_func=None, *args, **kwargs):
        if quote_func:
            return quote_func(self.field)
        else:
            return self.field

class SKUManager(models.Manager):
    def active(self):
        return self.filter(active=True)

    def can_order(self):
        qs = self.active().annotate(_quantity_left=models.Sum('inventory__quantity'),
                                    _in_stock=models.Sum('inventory__in_stock'),
                                    _inventories=models.Count('inventory__id'),)
        #Django 1.1 doesn't handle ORs and HAVING very well, so slight hack around
        #ticket 11293 - http://code.djangoproject.com/ticket/11293
        filters = [('_quantity_left', 'gt', 0), ('_in_stock', 'gt', 0), ('_inventories', 'exact', 0)]
        if ALLOW_BACKORDERS:
            filters.append(('allow_backorder', 'exact', 1))
        entry = qs.query.where_class()
        for name, lookup_type, value in filters:
            for alias, aggregate in qs.query.aggregates.items():
                if alias == name:
                    entry.add((aggregate, lookup_type, value), 'OR')
                    break
            else:
                aggregate = PassThroughAggregate(name)
                entry.add((aggregate, lookup_type, value), 'OR')
        qs.query.having.add(entry, 'AND')
        return qs

comment:12 by Beuc, 14 years ago

milestone: 1.2
Owner: set to Beuc
Status: newassigned

Not sure if this is the right way to do it, but I suggest this be targetted for 1.2.

The result of the ORM queries can be completely incorrect ('AND' instead of 'OR' in SQL), I was sure it was considered for 1.2 until I read the "release schedule - Update 7".

See another example/test-case at (duplicate) #13415 .

comment:13 by Russell Keith-Magee, 14 years ago

milestone: 1.2

The only bugs being accepted for 1.2 are bugs that cause catastrophic data loss, regressions in behavior from 1.1, or are major flaws in features added in the 1.2 cycle. This bug meets none of these criteria.

comment:14 by Beuc, 14 years ago

Thanks for clarifying.

Do you plan to document that aggregation filtering is not reliable?
I wouldn't have used it in the first place if I knew it never was, even since 1.1 if I understand correctly.

Right now I'm going to rewrite the impacted code in my project to use some custom SQL, because we can't afford an incorrect result.

comment:15 by Beuc, 14 years ago

Owner: changed from Beuc to -
Status: assignednew

(not sure how this bug got assigned to me)

comment:16 by dlevine, 14 years ago

Super helpful zbyte64, thanks! Similarly hacked.

I agree that a partial patch to fix the OR of aggregates only would be a worthwhile initial fix.

comment:17 by None, 14 years ago

Just marked this as a duplicate, http://code.djangoproject.com/ticket/13692#comment:1, a person making or improving this patch should probably look at it though since this is a large bug and affects a wide range of functionality.

repost for quick search purposes

class Professional(models.Model):
	hits = models.PositiveIntegerField() 

class Article(models.Model):
	hits = models.PositiveIntegerField()
	authors = models.ManyToManyField(Professional)

class Query(models.Model):
	hits = models.PositiveIntegerField() 
	author = models.ForeignKey(Professional, blank = True, null = True) 


Professional.objects.annotate(article_hits=Sum("article__hits"), query_hits=Sum("query__hits")).exclude(Q(article_hits=None)&Q(query_hits=None))

comment:18 by Alex Gaynor, 13 years ago

Resolution: fixed
Status: newclosed

(In [15173]) Fixed #11293 -- fixed using Q objects to generate ORs with aggregates.

comment:19 by Alex Gaynor, 13 years ago

(In [15174]) [1.2.X] Fixed #11293 -- fixed using Q objects to generate ORs with aggregates. Backport of [15173].

comment:20 by uwe+django@…, 12 years ago

Easy pickings: unset
Resolution: fixed
Severity: Normal
Status: closedreopened
Type: Bug
UI/UX: unset
Version: master1.4

I think I hit exactly this bug with Django 1.4 (from Debian 1.4-1)

>>> from django.db.models import Q, Count
>>> from django.contrib.auth.models import User
>>> q1 = Q(username='tim')
>>> q2 = Q(groups__count__gt=1)
>>> query = User.objects.annotate(Count('groups'))
>>> query.filter(q1)
[<User: tim>]
>>> query.filter(q2)
[<User: uwe>]
>>> query.filter(q1 | q2)
[]

I expected the last result to contain both, tim and uwe.

>>> query.filter(q1 | q2).query.sql_with_params()
('SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined`, COUNT(`auth_user_groups`.`group_id`) AS `groups__count` FROM `auth_user` LEFT OUTER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) WHERE (`auth_user`.`username` = %s ) GROUP BY `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` HAVING COUNT(`auth_user_groups`.`group_id`) > %s  ORDER BY NULL', ('tim', 1))

I guess the former 'fixed' marking only applies to the easier case in the original report.

Thanks

comment:21 by uwe+django@…, 12 years ago

Cc: uwe+django@… added

comment:22 by Anssi Kääriäinen, 12 years ago

Patch needs improvement: unset

I tested the above with https://github.com/akaariai/django/tree/refactor_utils_tree and that patch seems to solve this issue.

This is the SQL query I get:

SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" HAVING ("auth_user"."username" = tim  OR COUNT("auth_user_groups"."group_id") > 1 )

I can't test if I get the correct results, as I don't have the test data available. But the query looks correct to me.

comment:23 by anonymous, 12 years ago

I can confirm that commit e691d86 from https://github.com/akaariai/django/tree/refactor_utils_tree returns the correct result for me. But note that commit

96c2eb4 (Added pre-add_q stage to sql/query.py)

broke that again. The following is returned on 96c2eb4:

'SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined"'

which returns all users for me. This is not surprising as it doesn't have the "HAVING" clause and doesn't even mention "tim".

comment:24 by Anssi Kääriäinen, 12 years ago

Are you sure about not doing some typo in the test? The result seems really surprising to me. Here is my test results:

akaariai@akaariai-UX31E:~/Programming/django/django_test/django$ git log -1
commit 96c2eb46b96bb0cd47e3ae7932d32401a6a28421
Author: Anssi Kääriäinen <akaariai@gmail.com>
Date:   Sat Jun 2 03:40:23 2012 +0300

    Added pre-add_q stage to sql/query.py

akaariai@akaariai-UX31E:~/Programming/django/django_test/django$ cd ..
akaariai@akaariai-UX31E:~/Programming/django/django_test$ cat test.py

from django.core.management import setup_environ
import settings
setup_environ(settings)
from django.db.models import Q, Count
from django.contrib.auth.models import User
q1 = Q(username='tim')
q2 = Q(groups__count__gt=1)
query = User.objects.annotate(Count('groups'))
print query.filter(q1 | q2).query

akaariai@akaariai-UX31E:~/Programming/django/django_test$ python test.py

SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."password",
"auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser",
"auth_user"."last_login", "auth_user"."date_joined",
COUNT("auth_user_groups"."group_id") AS "groups__count"
FROM "auth_user"
LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id")
GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."password",
"auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser",
"auth_user"."last_login", "auth_user"."date_joined"
HAVING ("auth_user"."username" = tim  OR COUNT("auth_user_groups"."group_id") > 1 )

So, for me this seems to give the correct result.

comment:25 by Anssi Kääriäinen, 12 years ago

There was an error in the branch - the branch splits q_objects to multiple different trees for having - where clause separation (find_having_splits()) in the branch. This made in-place modifications to the given q_object and this was the cause of the error. I made a change where the q_object is used as-is in the query, except when it contains references to aggregates, in which case it is first deepcopied, and only then splitted to parts.

I have updated the https://github.com/akaariai/django/tree/refactor_utils_tree branch to contain a fix for this.

comment:26 by fhahn, 11 years ago

Component: ORM aggregationDatabase layer (models, ORM)
Keywords: annotate added

comment:27 by Anssi Kääriäinen, 11 years ago

Resolution: fixed
Status: reopenedclosed

comment:28 by GitHub <noreply@…>, 2 years ago

In a46bc32:

Refs #11293 -- Added test for filtering aggregates with negated & operator.

Note: See TracTickets for help on using tickets.
Back to Top