Opened 19 months ago

Closed 19 months ago

Last modified 5 months ago

#34502 closed Bug (duplicate)

Union giving wrong column ordering even when using .values/.values_list

Reported by: Thiago Bellini Ribeiro Owned by: nobody
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is a simplified example of a union I have in a project:

Product.objects.all().annotate(
    kind=Value("base_price", output_field=models.CharField()),
    priority=Value(0),
    product_id=F("id"),
    pos_id=Value(None, output_field=models.BigAutoField()),
).values(
    "kind",
    "priority",
    "price",
    "product_id",
    "pos_id",
).order_by().union(
    PriceTableProduct.objects.all().annotate(
        priority=Value(1),
        kind=Value("price-table", output_field=models.CharField()),
        pos_id=F("price_table__pos__id"),
    )
    .values(
        "kind",
        "priority",
        "price",
        "product_id",
        "pos_id",
    ),
)

On both sqlite and postgresql this produces wrong sql. On sqlite:

SELECT "product_product"."price" AS "col1",
       'base_price' AS "kind",
       0 AS "priority",
       "product_product"."id" AS "product_id",
       NULL AS "pos_id"
  FROM "product_product"
 UNION SELECT "product_pricetableproduct"."price" AS "col1",
       "product_pricetableproduct"."product_id" AS "col2",
       1 AS "priority",
       'price-table' AS "kind",
       "pos_pos"."id" AS "pos_id"
  FROM "product_pricetableproduct"
 INNER JOIN "product_pricetable"
    ON ("product_pricetableproduct"."price_table_id" = "product_pricetable"."id")
  LEFT OUTER JOIN "pos_pos"
    ON ("product_pricetable"."id" = "pos_pos"."price_table_id")
 LIMIT 21

On postgreql:

(
        SELECT "product_product"."price" AS "col1",
               'base_price' AS "kind",
               0 AS "priority",
               "product_product"."id" AS "product_id",
               NULL AS "pos_id"
          FROM "product_product"
       )
 UNION (
        SELECT "product_pricetableproduct"."price" AS "col1",
               "product_pricetableproduct"."product_id" AS "col2",
               1 AS "priority",
               'price-table' AS "kind",
               "pos_pos"."id" AS "pos_id"
          FROM "product_pricetableproduct"
         INNER JOIN "product_pricetable"
            ON ("product_pricetableproduct"."price_table_id" = "product_pricetable"."id")
          LEFT OUTER JOIN "pos_pos"
            ON ("product_pricetable"."id" = "pos_pos"."price_table_id")
       )
 LIMIT 21

You can see that the orders given to values are not preserved, and thus the returned value is wrong (e.g. "kind" is the second column in the base query, but "product_id" is the second column on the union query)

I'm using django 4.2, so not sure if this is a 4.2 regression or some old issue.

obs. the .order_by in the first query is also another issue. Since that model has an ordering defined in its meta, I get DatabaseError: ORDER BY not allowed in subqueries of compound statements. when running that query on sqlite (on postgresql it is fine and works without the empty order_by, but the main issue in this ticket still exists)

Change History (3)

comment:1 by Mariusz Felisiak, 19 months ago

Resolution: duplicate
Status: newclosed

Duplicate of #28900.

comment:2 by David, 5 months ago

This may not be a duplicate of #28900. The problem here is in how annotate is working with the SQL SELECT.

In django 4.2 the order in which annotations are created matters here is an example:

class Foo(models.Model):
    name = models.CharField(max_length=16)
    value = models.SmallIntegerField()


qs1 = (
    Foo.objects.all()
    .annotate(
        kind=Value(1, output_field=SmallIntegerField()),
        pos=Value("value", output_field=CharField()),
    )
    .values("name", "kind", "value", "pos")
)
qs2 = (
    Foo.objects.all()
    .annotate(
        pos=Value("other", output_field=CharField()),
        kind=Value(2, output_field=SmallIntegerField()),
    )
    .values("name", "kind", "value", "pos")
)

print(qs1.union(qs2).query)

In django 4.2 this gets translated by the ORM in the following SQL

(
    SELECT
        "django_issue_34502_foo"."name" AS "col1",
        "django_issue_34502_foo"."value" AS "col2",
        1 AS "kind",
        value AS "pos"
    FROM
        "django_issue_34502_foo"
)
UNION
(
    SELECT
        "django_issue_34502_foo"."name" AS "col1",
        "django_issue_34502_foo"."value" AS "col2",
        other AS "pos",
        2 AS "kind"
    FROM
        "django_issue_34502_foo"
)

As you can note the order of annotations is reflected in the SELECT clause.

This owever seems to have been fixed in the 5.0 since the same invocation of the ORM now gives the fields in a compatible order:

(
    SELECT
        "django_issue_34502_foo"."name" AS "col1",
        "django_issue_34502_foo"."value" AS "col2",
        1 AS "kind",
        value AS "pos"
    FROM
        "django_issue_34502_foo"
)
UNION
(
    SELECT
        "django_issue_34502_foo"."name" AS "col1",
        "django_issue_34502_foo"."value" AS "col2",
        2 AS "kind",
        other AS "pos"
    FROM
        "django_issue_34502_foo"
)

However there is still a possibile issue: even in django 5.0 the order of fields is not the one declare in the .values/.values_list! First there are model fields and then annotations, it is like instead of calling .values("name", "kind", "value", "pos") it was called .values("name", "value", "kind", "pos").

This should at least be pointed out in the docs, because it may confuse users which belive being able to control the order in which fields/functions are placed in the sql SELECT which is very important when performing query operations like UNION, INTERSECT or DIFFERENCE.

comment:3 by Simon Charette, 5 months ago

Cc: Simon Charette added

I'm not sure in which issue this is captured but the underlying problem here is that order of fields specified in values is always going to be

SELECT *extra_fields, *model_fields, *annotations

In other words doing values("annotation1", "field1", "field2", "annotation2") is going to result in

SELECT "table"."field1", "table"."field2", annotation1_expr, annotation2_expr

What was fixed in Django 5.0 is that the local order (within the respective select groups) will respect the order specified in values for annotations but we still haven't fixed the global ordering problem. See #28553 and d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67.

I tried giving it a shot in this MR but there are complications regarding the usage of extra(select) that forced me to take a detour in attempting to deprecate it entirely.

In the mean time your best shot at working around the issue is to always use annotations in values so something like

qs3 = (
    Foo.objects.all()
    .annotate(
        name_an=F("name"),
        value_an=F("value"),
        pos=Value("other", output_field=CharField()),
        kind=Value(2, output_field=SmallIntegerField()),
    )
    .values("name_an", "kind", "value_an", "pos")
)
Note: See TracTickets for help on using tickets.
Back to Top