Opened 6 years ago

Closed 2 years ago

#11293 closed Bug (fixed)

Filters on aggregates lose connector

Reported by: django@… Owned by: -
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords: having, where, aggregate, connector, annotate
Cc: jay.wineinger@…, gz, eallik@…, bas@…, ego@…, uwe+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When adding a filter on an aggregated column, it must be moved to the having clause instead of the where clause, but it loses the connector. Using the model in the Aggregation help topic:

>>> print Book.objects.values('publisher').annotate(pub_books=Count('id')).filter(Q(pub_books__gt=10) | Q(pub_books=1)).query.as_sql()
('SELECT `having_test_book`.`publisher_id`, COUNT(`having_test_book`.`id`) AS `pub_books` FROM `having_test_book` GROUP BY `having_test_book`.`publisher_id` HAVING (COUNT(`having_test_book`.`id`) > %s  AND COUNT(`having_test_book`.`id`) = %s ) ORDER BY NULL', (10, 1))

The HAVING clause should be an OR, not an AND. I think this happens because django.db.models.sql.query add_filter treats each filter individually.

A more complicated case that also needs to be handled is a filter that involves OR connectors as well as mixed aggregate and non-aggregate columns:

>>> print Book.objects.values('publisher').annotate(pub_books=Count('id')).filter(Q(pub_books__gt=10) | Q(pages=1)).query.as_sql()
('SELECT `having_test_book`.`publisher_id`, COUNT(`having_test_book`.`id`) AS `pub_books` FROM `having_test_book` WHERE `having_test_book`.`pages` = %s  GROUP BY `having_test_book`.`publisher_id` HAVING COUNT(`having_test_book`.`id`) > %s  ORDER BY NULL', (1, 10))

One filter ends up in WHERE, and the other in HAVING, but both need to be moved to HAVING for the connector to work.

Attachments (1)

django_11414_filter_patch.diff (1.6 KB) - added by shadfc 6 years ago.
patch against 11414, fixes "simple" cases

Download all attachments as: .zip

Change History (28)

comment:1 Changed 6 years ago by Alex

  • Component changed from Uncategorized to ORM aggregation
  • Needs documentation unset
  • Needs tests unset
  • Owner nobody deleted
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 6 years ago by anonymous

  • Cc jay.wineinger@… added

Changed 6 years ago by shadfc

patch against 11414, fixes "simple" cases

comment:3 Changed 6 years ago by Alex

  • Has patch set
  • Patch needs improvement set

This won't fix cases where one of the items isn't an aggregate though.

comment:4 Changed 6 years ago by shadfc

The patch I attached fixes (I think) SOME complex filtering on annotations. My use case is that I wanted to generate a sum of a field on a related model and only show records where a field on the parent is greater than the sum from the related models. But since the related model field could be null, I also had to include those into the result.

MyModel.objects.annotate(rel_qty=Sum('related__nullnumberfield').filter(Q(qty__gt=F('rel_qty')) | Q(rel_qty__isnull=True))

Before this patch, the SQL generated for the HAVING clause of the query by the two OR'd Q() objects was actually an AND. It seemed that add_filter always AND'd these instead of honoring the connector. I am not sure why that is/was, so someone enlighten me if I am missing an important fact here.

This fixes the first case in the ticket description, but not the second. I may be wrong here, but I think they may be separate issues, so perhaps this ticket can be modified to deal with the "simple" cases as I call them and another opened to deal with the rest.

comment:5 Changed 6 years ago by gz

  • Cc gz added

comment:6 Changed 6 years ago by RaceCondition

  • Cc eallik@… added

comment:7 Changed 6 years ago by anonymous

  • Cc eallik@… bas@… added; eallik@… removed

comment:8 Changed 6 years ago by anonymous

  • Cc eallik@… added; eallik@… removed

comment:9 Changed 6 years ago by anentropic

  • Cc ego@… added

I've just run into this bug... unfortunately I need the second case fixed :(

End result in both cases is you get results of an AND query where you've specified an OR

I tried doing the alternate form:
qs = Book.objects.values('publisher').annotate(pub_books=Count('id'))
qs = qs.filter(pub_booksgt=10) | qs.filter(pages=1)

But this is (differently) broken too.

comment:10 Changed 6 years ago by anentropic

with formatting:

qs = Book.objects.values('publisher').annotate(pub_books=Count('id'))
qs = qs.filter(pub_books__gt=10) | qs.filter(pages=1)

comment:11 Changed 5 years ago by zbyte64

I've managed to hack around this in my own code, hopefully this will help someone else.

class PassThroughAggregate(object):
    def __init__(self, field):
        self.field = field

    def as_sql(self, quote_func=None, *args, **kwargs):
        if quote_func:
            return quote_func(self.field)
        else:
            return self.field

class SKUManager(models.Manager):
    def active(self):
        return self.filter(active=True)

    def can_order(self):
        qs = self.active().annotate(_quantity_left=models.Sum('inventory__quantity'),
                                    _in_stock=models.Sum('inventory__in_stock'),
                                    _inventories=models.Count('inventory__id'),)
        #Django 1.1 doesn't handle ORs and HAVING very well, so slight hack around
        #ticket 11293 - http://code.djangoproject.com/ticket/11293
        filters = [('_quantity_left', 'gt', 0), ('_in_stock', 'gt', 0), ('_inventories', 'exact', 0)]
        if ALLOW_BACKORDERS:
            filters.append(('allow_backorder', 'exact', 1))
        entry = qs.query.where_class()
        for name, lookup_type, value in filters:
            for alias, aggregate in qs.query.aggregates.items():
                if alias == name:
                    entry.add((aggregate, lookup_type, value), 'OR')
                    break
            else:
                aggregate = PassThroughAggregate(name)
                entry.add((aggregate, lookup_type, value), 'OR')
        qs.query.having.add(entry, 'AND')
        return qs

comment:12 Changed 5 years ago by Beuc

  • milestone set to 1.2
  • Owner set to Beuc
  • Status changed from new to assigned

Not sure if this is the right way to do it, but I suggest this be targetted for 1.2.

The result of the ORM queries can be completely incorrect ('AND' instead of 'OR' in SQL), I was sure it was considered for 1.2 until I read the "release schedule - Update 7".

See another example/test-case at (duplicate) #13415 .

comment:13 Changed 5 years ago by russellm

  • milestone 1.2 deleted

The only bugs being accepted for 1.2 are bugs that cause catastrophic data loss, regressions in behavior from 1.1, or are major flaws in features added in the 1.2 cycle. This bug meets none of these criteria.

comment:14 Changed 5 years ago by Beuc

Thanks for clarifying.

Do you plan to document that aggregation filtering is not reliable?
I wouldn't have used it in the first place if I knew it never was, even since 1.1 if I understand correctly.

Right now I'm going to rewrite the impacted code in my project to use some custom SQL, because we can't afford an incorrect result.

comment:15 Changed 5 years ago by Beuc

  • Owner changed from Beuc to -
  • Status changed from assigned to new

(not sure how this bug got assigned to me)

comment:16 Changed 5 years ago by dlevine

Super helpful zbyte64, thanks! Similarly hacked.

I agree that a partial patch to fix the OR of aggregates only would be a worthwhile initial fix.

comment:17 Changed 5 years ago by glassresistor

Just marked this as a duplicate, http://code.djangoproject.com/ticket/13692#comment:1, a person making or improving this patch should probably look at it though since this is a large bug and affects a wide range of functionality.

repost for quick search purposes

class Professional(models.Model):
	hits = models.PositiveIntegerField() 

class Article(models.Model):
	hits = models.PositiveIntegerField()
	authors = models.ManyToManyField(Professional)

class Query(models.Model):
	hits = models.PositiveIntegerField() 
	author = models.ForeignKey(Professional, blank = True, null = True) 


Professional.objects.annotate(article_hits=Sum("article__hits"), query_hits=Sum("query__hits")).exclude(Q(article_hits=None)&Q(query_hits=None))

comment:18 Changed 5 years ago by Alex

  • Resolution set to fixed
  • Status changed from new to closed

(In [15173]) Fixed #11293 -- fixed using Q objects to generate ORs with aggregates.

comment:19 Changed 5 years ago by Alex

(In [15174]) [1.2.X] Fixed #11293 -- fixed using Q objects to generate ORs with aggregates. Backport of [15173].

comment:20 Changed 3 years ago by uwe+django@…

  • Easy pickings unset
  • Resolution fixed deleted
  • Severity set to Normal
  • Status changed from closed to reopened
  • Type set to Bug
  • UI/UX unset
  • Version changed from master to 1.4

I think I hit exactly this bug with Django 1.4 (from Debian 1.4-1)

>>> from django.db.models import Q, Count
>>> from django.contrib.auth.models import User
>>> q1 = Q(username='tim')
>>> q2 = Q(groups__count__gt=1)
>>> query = User.objects.annotate(Count('groups'))
>>> query.filter(q1)
[<User: tim>]
>>> query.filter(q2)
[<User: uwe>]
>>> query.filter(q1 | q2)
[]

I expected the last result to contain both, tim and uwe.

>>> query.filter(q1 | q2).query.sql_with_params()
('SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined`, COUNT(`auth_user_groups`.`group_id`) AS `groups__count` FROM `auth_user` LEFT OUTER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) WHERE (`auth_user`.`username` = %s ) GROUP BY `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` HAVING COUNT(`auth_user_groups`.`group_id`) > %s  ORDER BY NULL', ('tim', 1))

I guess the former 'fixed' marking only applies to the easier case in the original report.

Thanks

comment:21 Changed 3 years ago by uwe+django@…

  • Cc uwe+django@… added

comment:22 Changed 3 years ago by akaariai

  • Patch needs improvement unset

I tested the above with https://github.com/akaariai/django/tree/refactor_utils_tree and that patch seems to solve this issue.

This is the SQL query I get:

SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" HAVING ("auth_user"."username" = tim  OR COUNT("auth_user_groups"."group_id") > 1 )

I can't test if I get the correct results, as I don't have the test data available. But the query looks correct to me.

comment:23 Changed 3 years ago by anonymous

I can confirm that commit e691d86 from https://github.com/akaariai/django/tree/refactor_utils_tree returns the correct result for me. But note that commit

96c2eb4 (Added pre-add_q stage to sql/query.py)

broke that again. The following is returned on 96c2eb4:

'SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined"'

which returns all users for me. This is not surprising as it doesn't have the "HAVING" clause and doesn't even mention "tim".

comment:24 Changed 3 years ago by akaariai

Are you sure about not doing some typo in the test? The result seems really surprising to me. Here is my test results:

akaariai@akaariai-UX31E:~/Programming/django/django_test/django$ git log -1
commit 96c2eb46b96bb0cd47e3ae7932d32401a6a28421
Author: Anssi Kääriäinen <akaariai@gmail.com>
Date:   Sat Jun 2 03:40:23 2012 +0300

    Added pre-add_q stage to sql/query.py

akaariai@akaariai-UX31E:~/Programming/django/django_test/django$ cd ..
akaariai@akaariai-UX31E:~/Programming/django/django_test$ cat test.py

from django.core.management import setup_environ
import settings
setup_environ(settings)
from django.db.models import Q, Count
from django.contrib.auth.models import User
q1 = Q(username='tim')
q2 = Q(groups__count__gt=1)
query = User.objects.annotate(Count('groups'))
print query.filter(q1 | q2).query

akaariai@akaariai-UX31E:~/Programming/django/django_test$ python test.py

SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."password",
"auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser",
"auth_user"."last_login", "auth_user"."date_joined",
COUNT("auth_user_groups"."group_id") AS "groups__count"
FROM "auth_user"
LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id")
GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."password",
"auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser",
"auth_user"."last_login", "auth_user"."date_joined"
HAVING ("auth_user"."username" = tim  OR COUNT("auth_user_groups"."group_id") > 1 )

So, for me this seems to give the correct result.

comment:25 Changed 3 years ago by akaariai

There was an error in the branch - the branch splits q_objects to multiple different trees for having - where clause separation (find_having_splits()) in the branch. This made in-place modifications to the given q_object and this was the cause of the error. I made a change where the q_object is used as-is in the query, except when it contains references to aggregates, in which case it is first deepcopied, and only then splitted to parts.

I have updated the https://github.com/akaariai/django/tree/refactor_utils_tree branch to contain a fix for this.

comment:26 Changed 2 years ago by fhahn

  • Component changed from ORM aggregation to Database layer (models, ORM)
  • Keywords annotate added

comment:27 Changed 2 years ago by akaariai

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.
Back to Top