Opened 5 years ago

Closed 2 years ago

#30209 closed Bug (duplicate)

Union with group by don't generate correct Subquery

Reported by: Nikolas Owned by: Can Sarıgöl
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: union, group by
Cc: Can Sarıgöl Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

reopen: https://code.djangoproject.com/ticket/30203

Simplified for example:
have simple query

    qs1: QuerySet = User.objects.filter(id__gt=10)

    qs1 = qs1.values('is_active').annotate(id=Count('pk'))

generate:

SELECT "suser_customuser"."is_active",
       COUNT("suser_customuser"."id") AS "id"
FROM "suser_customuser"
WHERE "suser_customuser"."id" > 10
GROUP BY "suser_customuser"."is_active"

all correct, now i wont use union

    qs1: QuerySet = User.objects.filter(id__gt=10)
    qs2: QuerySet = User.objects.filter(id__lt=10)

    qs1 = qs1.values('is_active').annotate(id=Count('pk'))

now get not correct SQL:

  (SELECT "suser_customuser"."is_active",
          "suser_customuser"."id"
   FROM "suser_customuser"
   WHERE "suser_customuser"."id" > 10)
UNION
  (SELECT "suser_customuser"."is_active",
          "suser_customuser"."id"
   FROM "suser_customuser"
   WHERE "suser_customuser"."id" < 10)

expected:

select *
from
  ((SELECT "suser_customuser"."is_active",
          "suser_customuser"."id"
   FROM "suser_customuser"
   WHERE "suser_customuser"."id" > 10)
UNION
  (SELECT "suser_customuser"."is_active",
          "suser_customuser"."id"
   FROM "suser_customuser"
   WHERE "suser_customuser"."id" < 10)) U0
group by is_active, id

i think this is bug, if not why not show any warning or exception?
or just say how rewrite query for correct use Subquery

screenshot: http://dl4.joxi.net/drive/2019/02/24/0002/0003/143363/63/2eab4a5318.jpg

Attachments (1)

2eab4a5318.jpg (132.4 KB ) - added by Nikolas 5 years ago.
screenshot

Download all attachments as: .zip

Change History (15)

by Nikolas, 5 years ago

Attachment: 2eab4a5318.jpg added

screenshot

comment:1 by Nikolas, 5 years ago

Type: UncategorizedBug

comment:2 by Simon Charette, 5 years ago

Needs documentation: set
Triage Stage: UnreviewedAccepted
Version: 1.11master

I agree that either an exception should be raised and the documentation be adjusted to explicitly mention aggregation won't work or adding an aggregate function should perform a subquery pushdown.

in reply to:  2 comment:3 by Nikolas, 5 years ago

Replying to Simon Charette:

I agree that either an exception should be raised and the documentation be adjusted to explicitly mention aggregation won't work or adding an aggregate function should perform a subquery pushdown.

Please can help, how can rewrite query for correct work???? My boss shut me in close time, but i stock there)

in reply to:  2 comment:4 by TKdka, 5 years ago

Replying to Simon Charette:

I agree that either an exception should be raised and the documentation be adjusted to explicitly mention aggregation won't work or adding an aggregate function should perform a subquery pushdown.

I'm planning to make this a sub query pushdown as suggested. FYI, this is my first contribution to the code base, so suggestions and feedback would be most appreciated.

comment:5 by Simon Charette, 5 years ago

I'm planning to make this a sub query pushdown as suggested. FYI, this is my first contribution to the code base, so suggestions and feedback would be most appreciated.

TKdka, not to discourage you but this is certainly going to be a hard first contribution as the ORM doesn't have any formalized API to perform such pushdown yet. If that can provide you any guidance the most similar code I can think of is in django.db.models.sql.Query.get_aggregation.

in reply to:  5 comment:6 by TKdka, 5 years ago

Replying to Simon Charette:

I'm planning to make this a sub query pushdown as suggested. FYI, this is my first contribution to the code base, so suggestions and feedback would be most appreciated.

TKdka, not to discourage you but this is certainly going to be a hard first contribution as the ORM doesn't have any formalized API to perform such pushdown yet. If that can provide you any guidance the most similar code I can think of is in django.db.models.sql.Query.get_aggregation.

Thanks for the guidance and code reference. As nobody else has stepped up on this in 3 weeks, I'll give it a shot (-:

in reply to:  5 comment:7 by TKdka, 5 years ago

Replying to Simon Charette:

I'm planning to make this a sub query pushdown as suggested. FYI, this is my first contribution to the code base, so suggestions and feedback would be most appreciated.

TKdka, not to discourage you but this is certainly going to be a hard first contribution as the ORM doesn't have any formalized API to perform such pushdown yet. If that can provide you any guidance the most similar code I can think of is in django.db.models.sql.Query.get_aggregation.

After looking into this, you are completely right that it's a big endeavor. The difficult part is that the query needs to be manipulated even after the annotate call, which is significantly different from the aggregate example.

Therefore, I'll submit a small patch that raises an exception when this is attempted.

I'm not sure what exception class should be used here. TypeError? Thoughts?

comment:8 by Simon Charette, 5 years ago

I'm not sure what exception class should be used here. TypeError? Thoughts?

Good question. I know that we're currently preventing some operations on combined querysets so I'd try to see what type of exception we're currently raising and stick to it to be coherent. I think .filter() is disallowed for example. It's possible that we simply crash though.

I think the best place to add this check would be in query.Queryset.annotate just before set_group_by is called if an annotation contains aggregates.

in reply to:  8 comment:9 by TKdka, 5 years ago

Replying to Simon Charette:

I'm not sure what exception class should be used here. TypeError? Thoughts?

Good question. I know that we're currently preventing some operations on combined querysets so I'd try to see what type of exception we're currently raising and stick to it to be coherent. I think .filter() is disallowed for example. It's possible that we simply crash though.

I think the best place to add this check would be in query.Queryset.annotate just before set_group_by is called if an annotation contains aggregates.

On where the exception should be raised...
I agree with general placement, but I'm thinking the exception should be raised as soon as an aggregate function as noticed, so just after
line 1048 is called The other part of the conditional,

clone.query.group_by = True

is problematic as well. This appears to create a group by over all the fields, which is not supported. Thoughts?

On the type of exception that should be raised...
Using union+filter does not raise an exception (nor does it work). I think that the correct exception should be something along the lines of a SQLNotSupported exception. The exception should state what SQL feature combination is not supported and then suggest using a raw query or rewriting the query. Is there already something like this?

On related bugs...
I found two similar bugs while exploring this feature (one of which is mentioned above). I'll create tickets for them as well, but wanted to mention them here too.

As mentioned above, there is a bug with filter + union as well. Performing a filter with a union neither raises an exception nor adds the filter.

qs1 = Book.objects.filter(pk__lte=10)  
qs2 = Book.objects.filter(pk__gt=10)
qs3 = qs1.union(qs2)
qs4  = qs3.filter(publisher_id=1)
print (qs4.query)

SELECT "aggregation_book"."id", "aggregation_book"."isbn", "aggregation_book"."name", "aggregation_book"."pages", "aggregation_book"."rating", "aggregation_book"."price", "aggregation_book"."contact_id", "aggregation_book"."publisher_id", "aggregation_book"."pubdate" FROM "aggregation_book" WHERE "aggregation_book"."id" <= 10 UNION SELECT "aggregation_book"."id", "aggregation_book"."isbn", "aggregation_book"."name", "aggregation_book"."pages", "aggregation_book"."rating", "aggregation_book"."price", "aggregation_book"."contact_id", "aggregation_book"."publisher_id", "aggregation_book"."pubdate" FROM "aggregation_book" WHERE "aggregation_book"."id" > 10

There's a similar bug when using filter + non-aggregating annotate functions. The annotated field is not added, and an exception is not raised.

qs1 = Book.objects.filter(pk__lte=10)  
qs2 = Book.objects.filter(pk__gt=10)
qs3 = qs1.union(qs2)
qs4  = qs3.annotate(annotateField=F('id')+1)
print (qs4.query)

SELECT "aggregation_book"."id", "aggregation_book"."isbn", "aggregation_book"."name", "aggregation_book"."pages", "aggregation_book"."rating", "aggregation_book"."price", "aggregation_book"."contact_id", "aggregation_book"."publisher_id", "aggregation_book"."pubdate" FROM "aggregation_book" WHERE "aggregation_book"."id" <= 10 UNION SELECT "aggregation_book"."id", "aggregation_book"."isbn", "aggregation_book"."name", "aggregation_book"."pages", "aggregation_book"."rating", "aggregation_book"."price", "aggregation_book"."contact_id", "aggregation_book"."publisher_id", "aggregation_book"."pubdate" FROM "aggregation_book" WHERE "aggregation_book"."id" > 10

comment:10 by Can Sarıgöl, 5 years ago

Cc: Can Sarıgöl added
Has patch: set

Hi, I've been tackling "union as a subquery" topic for a while. So I tried to fix, could you review? Thanks. PR

Mentioned PR

comment:11 by Can Sarıgöl, 5 years ago

Owner: changed from nobody to Can Sarıgöl
Status: newassigned

comment:12 by Matthijs Kooijman, 4 years ago

It seems the linked PR was already closed, so it seems implementing this properly has been abandoned?

As for showing a warning when a union is annotated with an aggregation, it seems that https://github.com/django/django/pull/11591 forbids *all* annotations on unions, so this case seems to be convered by that as well. Not sure what that means for the status of this ticket, though.

in reply to:  12 comment:13 by Mariusz Felisiak, 4 years ago

Replying to Matthijs Kooijman:

It seems the linked PR was already closed, so it seems implementing this properly has been abandoned?

As for showing a warning when a union is annotated with an aggregation, it seems that https://github.com/django/django/pull/11591 forbids *all* annotations on unions, so this case seems to be convered by that as well. Not sure what that means for the status of this ticket, though.

I opened a new PR but it's still WIP. There is a tricky parentheses issue on MySQL/MariaDB. I hope I will find time to finish it.

comment:14 by Mariusz Felisiak, 2 years ago

Resolution: duplicate
Status: assignedclosed

A proper exception when calling annotate() after union(), intersection(), and difference() was added in 1853383969a4c53bbeba998757c30410bd3df4bb.

Closing as a duplicate of #28519.

Note: See TracTickets for help on using tickets.
Back to Top