Opened 3 weeks ago

Closed 2 weeks ago

Last modified 2 weeks ago

#37097 closed Bug (fixed)

Nested double-unioned query for model with ordering fails on Postgres/MySql

Reported by: Shai Berger Owned by: Jacob Walls
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords:
Cc: Shai Berger, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Yep. Sorry about the title, that's the most succinct I can make it.

Add the following test in django/tests/queries/test_qs_combinators.py, say after test_union_in_with_ordering_and_slice

    def test_double_union_in_with_ordering(self):
        qs1 = Author.objects.filter(num__gt=7)
        qs2 = Author.objects.filter(num__lt=2)
        qs3 = Author.objects.filter(num=5)
        self.assertQuerySetEqual(
            # Query formatted to allow easy commenting-out of parts
            Author.objects.exclude(id__in=qs1
                                          .union(qs2, all=True)
                                          .union(qs3, all=True)
                                          .values("id")),
            []
        )

On Postgres, this passes on 6.0, but fails on the main branch (towards 6.1 as I write this). The error is

 django.db.utils.ProgrammingError: each UNION query must have the same number of columns
LINE 1: ...) ORDER BY "__orderbycol2" ASC) UNION ALL (SELECT "U0"."id" ...

because the query (formatted for some readability) is

SELECT "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id" 
FROM "queries_author" 
WHERE NOT (
  "queries_author"."id" IN (
    (
      (
        SELECT "U0"."id" AS "id", "U0"."name" AS "__orderbycol2" 
        FROM "queries_author" "U0" 
        WHERE "U0"."num" > 7
        ORDER BY "U0"."name" ASC
      ) UNION ALL (
        SELECT "U0"."id" AS "id", "U0"."name" AS "__orderbycol2"
        FROM "queries_author" "U0"
        WHERE "U0"."num" < 2
        ORDER BY "U0"."name" ASC
      ) 
      ORDER BY "__orderbycol2" ASC           -- I think this is the culprit
    ) UNION ALL (
      SELECT "U0"."id" AS "id"
      FROM "queries_author" "U0"
      WHERE "U0"."num" = 5
      ORDER BY "U0"."name" ASC
    )
  )
) 
ORDER BY "queries_author"."name" ASC

Notes:

  • Double union is required, a single union works
  • This happens when the ordering is defined in the model Meta, but not if you just add ordering on the queries. I suspect this is because having the ordering in the Meta makes the query equivalent to
    Author.objects.exclude(id__in=qs1
                                  .union(qs2, all=True)
                                  .order_by("name")    # This is implied
                                  .union(qs3, all=True)
                                  .values("id"))
    
  • On Oracle and Sqlite, the inner order is removed by the fix of #36938. I haven't actually tested it on MySql, but I believe it has the same relevant feature flags and so should behave the same.
  • The test as written fails on Sqlite on 6.0, with django.db.utils.DatabaseError: ORDER BY not allowed in subqueries of compound statements -- similar tests are skipped there; but it passes on Postgres, and passes on Sqlite on main.

Change History (9)

comment:1 by Jacob Walls, 3 weeks ago

Keywords: Regression removed
Owner: set to Jacob Walls
Status: newassigned
Triage Stage: UnreviewedAccepted

Thanks, I'll take a look today.

comment:2 by Jacob Walls, 3 weeks ago

Cc: Simon Charette added

Bisected to 087bb9e8f3478d53f12b1737af865992af17c5f2, which seems like an improvement.

SQL on stable/6.0.x:

SELECT ...
FROM "queries_author"
WHERE NOT ("queries_author"."id" IN ((
                                        (SELECT U0."id" AS "id"
                                         FROM "queries_author" U0
                                         WHERE U0."num" > 7
                                         ORDER BY U0."name" ASC)
                                      UNION ALL
                                        (SELECT U0."id" AS "id"
                                         FROM "queries_author" U0
                                         WHERE U0."num" < 2
                                         ORDER BY U0."name" ASC))
                                     UNION ALL
                                       (SELECT U0."id" AS "id"
                                        FROM "queries_author" U0
                                        WHERE U0."num" = 5
                                        ORDER BY U0."name" ASC)))
ORDER BY "queries_author"."name" ASC;

And on main:

SELECT ...
FROM "queries_author"
WHERE NOT ("queries_author"."id" IN ((
                                        (SELECT "U0"."id" AS "id",
                                                "U0"."name" AS "__orderbycol2"
                                         FROM "queries_author" "U0"
                                         WHERE "U0"."num" > 7
                                         ORDER BY "U0"."name" ASC)
                                      UNION ALL
                                        (SELECT "U0"."id" AS "id",
                                                "U0"."name" AS "__orderbycol2"
                                         FROM "queries_author" "U0"
                                         WHERE "U0"."num" < 2
                                         ORDER BY "U0"."name" ASC)
                                      ORDER BY "__orderbycol2" ASC)
                                     UNION ALL
                                       (SELECT "U0"."id" AS "id"
                                        FROM "queries_author" "U0"
                                        WHERE "U0"."num" = 5
                                        ORDER BY "U0"."name" ASC)))
ORDER BY "queries_author"."name" ASC;

Two options come to mind:

Option 1: take a lighter touch when setting default ordering after union? (EDIT: removed incorrect suggested diff.)

Option 2: Extend the fix from yesterday's Oracle/SQLite fixes to also clear unnecessary orderings on Postgres:

  • django/db/models/sql/compiler.py

    diff --git a/django/db/models/sql/compiler.py b/django/db/models/sql/compiler.py
    index 764dc46cfc..bf90397ed8 100644
    a b class SQLCompiler:  
    636636        if selected is not None and compiler.query.selected is None:
    637637            compiler.query = compiler.query.clone()
    638638            compiler.query.set_values(selected)
    639         if (
    640             (
    641                 features.requires_compound_order_by_subquery
    642                 and not features.ignores_unnecessary_order_by_in_subqueries
    643             )
    644             or not features.supports_parentheses_in_compound
    645         ) and compiler.get_order_by():
     639        if compiler.get_order_by():
    646640            compiler.query = compiler.query.clone()
    647641            compiler.query.clear_ordering(force=False)
    648642        part_sql, part_args = compiler.as_sql(with_col_aliases=True)

Simon, do you have an opinion about the best course of action here?

Last edited 3 weeks ago by Jacob Walls (previous) (diff)

comment:3 by Simon Charette, 3 weeks ago

When an __in lookup is used we explicitly clear the ordering as it's unnecessary (including the default one).

This problem made me think of your initial approach at making Query.clear_ordering somewhat recursive Jacob. Should we consider making the following changes instead?

  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index 45192b7809..603b7056ad 100644
    a b class Query(BaseExpression):  
    24022402        self.extra_order_by = ()
    24032403        if clear_default:
    24042404            self.default_ordering = False
     2405        for query in self.combined_queries:
     2406            query.clear_ordering(force=force, clear_default=clear_default)
    24052407
    24062408    def set_group_by(self, allow_aliases=True):
    24072409        """
Last edited 3 weeks ago by Simon Charette (previous) (diff)

comment:4 by Jacob Walls, 3 weeks ago

Has patch: set

And that was only yesterday!

Applied in PR

comment:5 by Jacob Walls, 3 weeks ago

Patch needs improvement: set

comment:6 by Jacob Walls, 3 weeks ago

Patch needs improvement: unset

comment:7 by Simon Charette, 2 weeks ago

Triage Stage: AcceptedReady for checkin

Thanks for the great discussion and iteration Jacob!

comment:8 by Jacob Walls <jacobtylerwalls@…>, 2 weeks ago

Resolution: fixed
Status: assignedclosed

In 0e1d950:

Fixed #37097 -- Made Query.clear_ordering() clear ordering on combined queries also.

Thanks Shai Berger for the report.

Regression in 087bb9e8f3478d53f12b1737af865992af17c5f2.

(That commit drove more traffic into an error that would have been
reachable only with an explicit order_by() after each union().)

Co-authored-by: Simon Charette <charettes@…>
Co-authored-by: siddus <dcsid10@…>

comment:9 by Jacob Walls <jacobtylerwalls@…>, 2 weeks ago

In fbe902d:

Refs #37097 -- Removed compilation-time order clearing on combined queries on Oracle.

Thanks Simon Charette, JaeHyuck Sa, and Shai Berger for reviews.

Note: See TracTickets for help on using tickets.
Back to Top