Opened 4 years ago

Closed 4 years ago

#32658 closed Uncategorized (invalid)

Subquery ignores query ordering

Reported by: Bálint Balina Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: Subquery, annotations, ordering
Cc: Simon Charette, Hannes Ljungberg Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've started migrating a huge project from Django==2.2 to ==3.2, and many tests started to fail, because of the problem below:

When using a query with ordering in an annotated Subquery, the query's ordering is ignored (or removed).

In a bare new Django project, I have successfully reproduced the issue:

from django.db import models


class MyModel(models.Model):
    parent = models.ForeignKey(to='self', on_delete=models.CASCADE, null=True)
    name = models.TextField()
MyModel.objects.create(name='parent')
MyModel.objects.create(name='b', parent=parent)
MyModel.objects.create(name='a', parent=parent)
children_query = MyModel.objects.filter(parent=OuterRef('pk')).annotate(child_names=Func(F('name'), function='group_concat')).values('child_names').order_by('name')
parent_query = MyModel.objects.annotate(child_names=Subquery(children_query))
parent_query.query.__str__()

Outputs:

SELECT "app_mymodel"."id", "app_mymodel"."parent_id", "app_mymodel"."name", (SELECT group_concat(U0."name") AS "child_names" FROM "app_mymodel" U0 WHERE U0."parent_id" = "app_mymodel"."id") AS "child_names" FROM "app_mymodel"

The only thing I do, is pip install Django==2.2, run the same thing in a shell and:

Outputs:

SELECT "app_mymodel"."id", "app_mymodel"."parent_id", "app_mymodel"."name", (SELECT group_concat(U0."name") AS "child_names" FROM "app_mymodel" U0 WHERE U0."parent_id" = ("app_mymodel"."id") ORDER BY U0."name" ASC) AS "child_names" FROM "app_mymodel"

Notice the ORDER BY U0."name" ASC is missing in Django 3.2, though I have explicitly set the ordering on the query. This happens both on SQLite and PostgreSQL backends.

Change History (6)

comment:1 by Bálint Balina, 4 years ago

After digging deeper, I have found that this is the duplicate of https://code.djangoproject.com/ticket/31687 , which was closed without any real explanation about why this working behaviour changed. I am aware of the solution provided, but it would require quite some code changes, which is fine but seems unreasonable. At very least I would like to know why it is necessary to remove the ordering from the subquery.

comment:2 by Simon Charette, 4 years ago

Cc: Simon Charette added
Resolution: invalid
Status: newclosed

IIRC the ordering is cleared for optimization purposes when LIMIT, DISTINCT and aggregation are not used.

I think that ordering wouldn't be disabled if your function was properly marked as being an aggregate (as it's one)

children_query = MyModel.objects.filter(
    parent=OuterRef('pk'),
).annotate(
    child_names=Aggregate('name', function='group_concat'),
).values('child_names').order_by('name')

If you use Func instead of Aggregate then the ORM has to way to know your opaque expression is performing aggregation and that ordering might be relevant.

If using Aggregate instead of Func still results in cleared ordering then please re-open this ticket as it shouldn't be the case otherwise this a clearly a misuse of the ORM and likely only work in your case because MySQL is very lax about GROUP BY requirements.

comment:3 by Bálint Balina, 4 years ago

Resolution: invalid
Status: closednew

Okay, that sounds sane, but the solution you provided does not work:

            children_query = MyModel.objects.filter(
                parent=OuterRef('pk'),
            ).annotate(
                child_names=Aggregate('name', function='group_concat'),
            ).values('child_names').order_by('name')
            
            queryset = MyModel.objects.annotate(
                other_names=Subquery(children_query)
            ).only('id')
            
            print(str(queryset.query))
            
SELECT "app_mymodel"."id", (SELECT ARRAY_AGG(U0."name" ) AS "child_names" FROM "app_mymodel" U0 WHERE U0."parent_id" = "app_mymodel"."id" GROUP BY U0."id") AS "other_names" FROM "app_mymodel"

comment:4 by Simon Charette, 4 years ago

I this case I think we should either make the logic in charge of clearing ordering consider contains_aggregate as well.

An alternative which might more correct would be to move the ordering clearing logic to In(Lookup).process_rhs after all that's what we do for clearing the select clause. The devils lurks in the details there though as the solution needs to handle Subquery, sql.Query, and QuerySet (all resolves to sql.Query though) right-hand-sides and the ordering can only be cleared under some particular conditions.

Maybe adding a sql.Query.as_haystack (please suggest a better name) method that takes care of all this logic and duck-typing against it in In(Lookup).process_rhs would provide a decoupled enough API?

comment:5 by Hannes Ljungberg, 4 years ago

Cc: Hannes Ljungberg added

comment:6 by Mariusz Felisiak, 4 years ago

Resolution: invalid
Status: newclosed

ORDER BY clause should be included in the GROUP_CONCAT() not in the main query, e.g. GROUP_CONCAT(U0."name" ORDER BY U0."name" ASC) AS "child_names"). You can check OrderableAggMixin to see how it's implemented for ArrayAgg, StringAgg, and JSONBAgg on PostgreSQL.

Note: See TracTickets for help on using tickets.
Back to Top