#11293 closed Bug (fixed)
Filters on aggregates lose connector
Reported by: | Owned by: | - | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4 |
Severity: | Normal | Keywords: | having, where, aggregate, connector, annotate |
Cc: | jay.wineinger@…, gz, eallik@…, bas@…, ego@…, uwe+django@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When adding a filter on an aggregated column, it must be moved to the having clause instead of the where clause, but it loses the connector. Using the model in the Aggregation help topic:
>>> print Book.objects.values('publisher').annotate(pub_books=Count('id')).filter(Q(pub_books__gt=10) | Q(pub_books=1)).query.as_sql() ('SELECT `having_test_book`.`publisher_id`, COUNT(`having_test_book`.`id`) AS `pub_books` FROM `having_test_book` GROUP BY `having_test_book`.`publisher_id` HAVING (COUNT(`having_test_book`.`id`) > %s AND COUNT(`having_test_book`.`id`) = %s ) ORDER BY NULL', (10, 1))
The HAVING clause should be an OR, not an AND. I think this happens because django.db.models.sql.query add_filter treats each filter individually.
A more complicated case that also needs to be handled is a filter that involves OR connectors as well as mixed aggregate and non-aggregate columns:
>>> print Book.objects.values('publisher').annotate(pub_books=Count('id')).filter(Q(pub_books__gt=10) | Q(pages=1)).query.as_sql() ('SELECT `having_test_book`.`publisher_id`, COUNT(`having_test_book`.`id`) AS `pub_books` FROM `having_test_book` WHERE `having_test_book`.`pages` = %s GROUP BY `having_test_book`.`publisher_id` HAVING COUNT(`having_test_book`.`id`) > %s ORDER BY NULL', (1, 10))
One filter ends up in WHERE, and the other in HAVING, but both need to be moved to HAVING for the connector to work.
Attachments (1)
Change History (29)
comment:1 by , 15 years ago
Component: | Uncategorized → ORM aggregation |
---|---|
Owner: | removed |
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 15 years ago
Cc: | added |
---|
by , 15 years ago
Attachment: | django_11414_filter_patch.diff added |
---|
comment:3 by , 15 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
This won't fix cases where one of the items isn't an aggregate though.
comment:4 by , 15 years ago
The patch I attached fixes (I think) SOME complex filtering on annotations. My use case is that I wanted to generate a sum of a field on a related model and only show records where a field on the parent is greater than the sum from the related models. But since the related model field could be null, I also had to include those into the result.
MyModel.objects.annotate(rel_qty=Sum('related__nullnumberfield').filter(Q(qty__gt=F('rel_qty')) | Q(rel_qty__isnull=True))
Before this patch, the SQL generated for the HAVING clause of the query by the two OR'd Q() objects was actually an AND. It seemed that add_filter always AND'd these instead of honoring the connector. I am not sure why that is/was, so someone enlighten me if I am missing an important fact here.
This fixes the first case in the ticket description, but not the second. I may be wrong here, but I think they may be separate issues, so perhaps this ticket can be modified to deal with the "simple" cases as I call them and another opened to deal with the rest.
comment:5 by , 15 years ago
Cc: | added |
---|
comment:6 by , 15 years ago
Cc: | added |
---|
comment:7 by , 15 years ago
Cc: | added; removed |
---|
comment:8 by , 15 years ago
Cc: | added; removed |
---|
comment:9 by , 15 years ago
Cc: | added |
---|
I've just run into this bug... unfortunately I need the second case fixed :(
End result in both cases is you get results of an AND query where you've specified an OR
I tried doing the alternate form:
qs = Book.objects.values('publisher').annotate(pub_books=Count('id'))
qs = qs.filter(pub_booksgt=10) | qs.filter(pages=1)
But this is (differently) broken too.
comment:10 by , 15 years ago
with formatting:
qs = Book.objects.values('publisher').annotate(pub_books=Count('id')) qs = qs.filter(pub_books__gt=10) | qs.filter(pages=1)
comment:11 by , 15 years ago
I've managed to hack around this in my own code, hopefully this will help someone else.
class PassThroughAggregate(object): def __init__(self, field): self.field = field def as_sql(self, quote_func=None, *args, **kwargs): if quote_func: return quote_func(self.field) else: return self.field class SKUManager(models.Manager): def active(self): return self.filter(active=True) def can_order(self): qs = self.active().annotate(_quantity_left=models.Sum('inventory__quantity'), _in_stock=models.Sum('inventory__in_stock'), _inventories=models.Count('inventory__id'),) #Django 1.1 doesn't handle ORs and HAVING very well, so slight hack around #ticket 11293 - http://code.djangoproject.com/ticket/11293 filters = [('_quantity_left', 'gt', 0), ('_in_stock', 'gt', 0), ('_inventories', 'exact', 0)] if ALLOW_BACKORDERS: filters.append(('allow_backorder', 'exact', 1)) entry = qs.query.where_class() for name, lookup_type, value in filters: for alias, aggregate in qs.query.aggregates.items(): if alias == name: entry.add((aggregate, lookup_type, value), 'OR') break else: aggregate = PassThroughAggregate(name) entry.add((aggregate, lookup_type, value), 'OR') qs.query.having.add(entry, 'AND') return qs
comment:12 by , 15 years ago
milestone: | → 1.2 |
---|---|
Owner: | set to |
Status: | new → assigned |
Not sure if this is the right way to do it, but I suggest this be targetted for 1.2.
The result of the ORM queries can be completely incorrect ('AND' instead of 'OR' in SQL), I was sure it was considered for 1.2 until I read the "release schedule - Update 7".
See another example/test-case at (duplicate) #13415 .
comment:13 by , 15 years ago
milestone: | 1.2 |
---|
The only bugs being accepted for 1.2 are bugs that cause catastrophic data loss, regressions in behavior from 1.1, or are major flaws in features added in the 1.2 cycle. This bug meets none of these criteria.
comment:14 by , 15 years ago
Thanks for clarifying.
Do you plan to document that aggregation filtering is not reliable?
I wouldn't have used it in the first place if I knew it never was, even since 1.1 if I understand correctly.
Right now I'm going to rewrite the impacted code in my project to use some custom SQL, because we can't afford an incorrect result.
comment:15 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
(not sure how this bug got assigned to me)
comment:16 by , 14 years ago
Super helpful zbyte64, thanks! Similarly hacked.
I agree that a partial patch to fix the OR of aggregates only would be a worthwhile initial fix.
comment:17 by , 14 years ago
Just marked this as a duplicate, http://code.djangoproject.com/ticket/13692#comment:1, a person making or improving this patch should probably look at it though since this is a large bug and affects a wide range of functionality.
repost for quick search purposes
class Professional(models.Model): hits = models.PositiveIntegerField() class Article(models.Model): hits = models.PositiveIntegerField() authors = models.ManyToManyField(Professional) class Query(models.Model): hits = models.PositiveIntegerField() author = models.ForeignKey(Professional, blank = True, null = True) Professional.objects.annotate(article_hits=Sum("article__hits"), query_hits=Sum("query__hits")).exclude(Q(article_hits=None)&Q(query_hits=None))
comment:18 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:19 by , 14 years ago
comment:20 by , 12 years ago
Easy pickings: | unset |
---|---|
Resolution: | fixed |
Severity: | → Normal |
Status: | closed → reopened |
Type: | → Bug |
UI/UX: | unset |
Version: | master → 1.4 |
I think I hit exactly this bug with Django 1.4 (from Debian 1.4-1)
>>> from django.db.models import Q, Count >>> from django.contrib.auth.models import User >>> q1 = Q(username='tim') >>> q2 = Q(groups__count__gt=1) >>> query = User.objects.annotate(Count('groups')) >>> query.filter(q1) [<User: tim>] >>> query.filter(q2) [<User: uwe>] >>> query.filter(q1 | q2) []
I expected the last result to contain both, tim and uwe.
>>> query.filter(q1 | q2).query.sql_with_params() ('SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined`, COUNT(`auth_user_groups`.`group_id`) AS `groups__count` FROM `auth_user` LEFT OUTER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) WHERE (`auth_user`.`username` = %s ) GROUP BY `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` HAVING COUNT(`auth_user_groups`.`group_id`) > %s ORDER BY NULL', ('tim', 1))
I guess the former 'fixed' marking only applies to the easier case in the original report.
Thanks
comment:21 by , 12 years ago
Cc: | added |
---|
comment:22 by , 12 years ago
Patch needs improvement: | unset |
---|
I tested the above with https://github.com/akaariai/django/tree/refactor_utils_tree and that patch seems to solve this issue.
This is the SQL query I get:
SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" HAVING ("auth_user"."username" = tim OR COUNT("auth_user_groups"."group_id") > 1 )
I can't test if I get the correct results, as I don't have the test data available. But the query looks correct to me.
comment:23 by , 12 years ago
I can confirm that commit e691d86 from https://github.com/akaariai/django/tree/refactor_utils_tree returns the correct result for me. But note that commit
96c2eb4 (Added pre-add_q stage to sql/query.py)
broke that again. The following is returned on 96c2eb4:
'SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined"'
which returns all users for me. This is not surprising as it doesn't have the "HAVING" clause and doesn't even mention "tim".
comment:24 by , 12 years ago
Are you sure about not doing some typo in the test? The result seems really surprising to me. Here is my test results:
akaariai@akaariai-UX31E:~/Programming/django/django_test/django$ git log -1 commit 96c2eb46b96bb0cd47e3ae7932d32401a6a28421 Author: Anssi Kääriäinen <akaariai@gmail.com> Date: Sat Jun 2 03:40:23 2012 +0300 Added pre-add_q stage to sql/query.py akaariai@akaariai-UX31E:~/Programming/django/django_test/django$ cd .. akaariai@akaariai-UX31E:~/Programming/django/django_test$ cat test.py from django.core.management import setup_environ import settings setup_environ(settings) from django.db.models import Q, Count from django.contrib.auth.models import User q1 = Q(username='tim') q2 = Q(groups__count__gt=1) query = User.objects.annotate(Count('groups')) print query.filter(q1 | q2).query akaariai@akaariai-UX31E:~/Programming/django/django_test$ python test.py SELECT "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined", COUNT("auth_user_groups"."group_id") AS "groups__count" FROM "auth_user" LEFT OUTER JOIN "auth_user_groups" ON ("auth_user"."id" = "auth_user_groups"."user_id") GROUP BY "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" HAVING ("auth_user"."username" = tim OR COUNT("auth_user_groups"."group_id") > 1 )
So, for me this seems to give the correct result.
comment:25 by , 12 years ago
There was an error in the branch - the branch splits q_objects to multiple different trees for having - where clause separation (find_having_splits()) in the branch. This made in-place modifications to the given q_object and this was the cause of the error. I made a change where the q_object is used as-is in the query, except when it contains references to aggregates, in which case it is first deepcopied, and only then splitted to parts.
I have updated the https://github.com/akaariai/django/tree/refactor_utils_tree branch to contain a fix for this.
comment:26 by , 12 years ago
Component: | ORM aggregation → Database layer (models, ORM) |
---|---|
Keywords: | annotate added |
comment:27 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
This was fixed in d3f00bd5706b35961390d3814dd7e322ead3a9a3.
patch against 11414, fixes "simple" cases