Opened 6 hours ago

Last modified 98 minutes ago

#37097 assigned Bug

Nested double-unioned query for model with ordering fails on Postgres/MySql

Reported by: Shai Berger Owned by: Jacob Walls
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords:
Cc: Shai Berger, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Yep. Sorry about the title, that's the most succinct I can make it.

Add the following test in django/tests/queries/test_qs_combinators.py, say after test_union_in_with_ordering_and_slice

    def test_double_union_in_with_ordering(self):
        qs1 = Author.objects.filter(num__gt=7)
        qs2 = Author.objects.filter(num__lt=2)
        qs3 = Author.objects.filter(num=5)
        self.assertQuerySetEqual(
            # Query formatted to allow easy commenting-out of parts
            Author.objects.exclude(id__in=qs1
                                          .union(qs2, all=True)
                                          .union(qs3, all=True)
                                          .values("id")),
            []
        )

On Postgres, this passes on 6.0, but fails on the main branch (towards 6.1 as I write this). The error is

 django.db.utils.ProgrammingError: each UNION query must have the same number of columns
LINE 1: ...) ORDER BY "__orderbycol2" ASC) UNION ALL (SELECT "U0"."id" ...

because the query (formatted for some readability) is

SELECT "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id" 
FROM "queries_author" 
WHERE NOT (
  "queries_author"."id" IN (
    (
      (
        SELECT "U0"."id" AS "id", "U0"."name" AS "__orderbycol2" 
        FROM "queries_author" "U0" 
        WHERE "U0"."num" > 7
        ORDER BY "U0"."name" ASC
      ) UNION ALL (
        SELECT "U0"."id" AS "id", "U0"."name" AS "__orderbycol2"
        FROM "queries_author" "U0"
        WHERE "U0"."num" < 2
        ORDER BY "U0"."name" ASC
      ) 
      ORDER BY "__orderbycol2" ASC           -- I think this is the culprit
    ) UNION ALL (
      SELECT "U0"."id" AS "id"
      FROM "queries_author" "U0"
      WHERE "U0"."num" = 5
      ORDER BY "U0"."name" ASC
    )
  )
) 
ORDER BY "queries_author"."name" ASC

Notes:

  • Double union is required, a single union works
  • This happens when the ordering is defined in the model Meta, but not if you just add ordering on the queries. I suspect this is because having the ordering in the Meta makes the query equivalent to
    Author.objects.exclude(id__in=qs1
                                  .union(qs2, all=True)
                                  .order_by("name")    # This is implied
                                  .union(qs3, all=True)
                                  .values("id"))
    
  • On Oracle and Sqlite, the inner order is removed by the fix of #36938. I haven't actually tested it on MySql, but I believe it has the same relevant feature flags and so should behave the same.
  • The test as written fails on Sqlite on 6.0, with django.db.utils.DatabaseError: ORDER BY not allowed in subqueries of compound statements -- similar tests are skipped there; but it passes on Postgres, and passes on Sqlite on main.

Change History (4)

comment:1 by Jacob Walls, 6 hours ago

Keywords: Regression removed
Owner: set to Jacob Walls
Status: newassigned
Triage Stage: UnreviewedAccepted

Thanks, I'll take a look today.

comment:2 by Jacob Walls, 4 hours ago

Cc: Simon Charette added

Bisected to 087bb9e8f3478d53f12b1737af865992af17c5f2, which seems like an improvement.

SQL on stable/6.0.x:

SELECT ...
FROM "queries_author"
WHERE NOT ("queries_author"."id" IN ((
                                        (SELECT U0."id" AS "id"
                                         FROM "queries_author" U0
                                         WHERE U0."num" > 7
                                         ORDER BY U0."name" ASC)
                                      UNION ALL
                                        (SELECT U0."id" AS "id"
                                         FROM "queries_author" U0
                                         WHERE U0."num" < 2
                                         ORDER BY U0."name" ASC))
                                     UNION ALL
                                       (SELECT U0."id" AS "id"
                                        FROM "queries_author" U0
                                        WHERE U0."num" = 5
                                        ORDER BY U0."name" ASC)))
ORDER BY "queries_author"."name" ASC;

And on main:

SELECT ...
FROM "queries_author"
WHERE NOT ("queries_author"."id" IN ((
                                        (SELECT "U0"."id" AS "id",
                                                "U0"."name" AS "__orderbycol2"
                                         FROM "queries_author" "U0"
                                         WHERE "U0"."num" > 7
                                         ORDER BY "U0"."name" ASC)
                                      UNION ALL
                                        (SELECT "U0"."id" AS "id",
                                                "U0"."name" AS "__orderbycol2"
                                         FROM "queries_author" "U0"
                                         WHERE "U0"."num" < 2
                                         ORDER BY "U0"."name" ASC)
                                      ORDER BY "__orderbycol2" ASC)
                                     UNION ALL
                                       (SELECT "U0"."id" AS "id"
                                        FROM "queries_author" "U0"
                                        WHERE "U0"."num" = 5
                                        ORDER BY "U0"."name" ASC)))
ORDER BY "queries_author"."name" ASC;

Two options come to mind:

Option 1: take a lighter touch when setting default ordering after union? (EDIT: removed incorrect suggested diff.)

Option 2: Extend the fix from yesterday's Oracle/SQLite fixes to also clear unnecessary orderings on Postgres:

  • django/db/models/sql/compiler.py

    diff --git a/django/db/models/sql/compiler.py b/django/db/models/sql/compiler.py
    index 764dc46cfc..bf90397ed8 100644
    a b class SQLCompiler:  
    636636        if selected is not None and compiler.query.selected is None:
    637637            compiler.query = compiler.query.clone()
    638638            compiler.query.set_values(selected)
    639         if (
    640             (
    641                 features.requires_compound_order_by_subquery
    642                 and not features.ignores_unnecessary_order_by_in_subqueries
    643             )
    644             or not features.supports_parentheses_in_compound
    645         ) and compiler.get_order_by():
     639        if compiler.get_order_by():
    646640            compiler.query = compiler.query.clone()
    647641            compiler.query.clear_ordering(force=False)
    648642        part_sql, part_args = compiler.as_sql(with_col_aliases=True)

Simon, do you have an opinion about the best course of action here?

Last edited 4 hours ago by Jacob Walls (previous) (diff)

comment:3 by Simon Charette, 3 hours ago

When an __in lookup is used we explicitly clear the ordering as it's unnecessary (including the default one).

This problem made me think of your initial approach at making Query.clear_ordering somewhat recursive Jacob. Should we consider making the following changes instead?

  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index 45192b7809..603b7056ad 100644
    a b class Query(BaseExpression):  
    24022402        self.extra_order_by = ()
    24032403        if clear_default:
    24042404            self.default_ordering = False
     2405        for query in self.combined_queries:
     2406            query.clear_ordering(force=force, clear_default=clear_default)
    24052407
    24062408    def set_group_by(self, allow_aliases=True):
    24072409        """
Last edited 3 hours ago by Simon Charette (previous) (diff)

comment:4 by Jacob Walls, 98 minutes ago

Has patch: set

And that was only yesterday!

Applied in PR

Note: See TracTickets for help on using tickets.
Back to Top