Opened 4 days ago

Closed 3 days ago

Last modified 2 days ago

#35830 closed Bug (invalid)

Queryset Union doesn't persist column names

Reported by: Paul Landon Tuckett Owned by:
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: orm, database, queryset, union, sql
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In Django v3.2.25 doing a queryset.union() would preserve the original column names. Upgrading to Django 4.2.16, this is no longer the case. It converts all the columns (minus annotations), to col1, col2, etc. The column names are the same in each queryset and the data types are the same. Even if you do a union() on the exact same queryset, the columns are renamed, ie: queryset.union(queryset). If this is expected behavior, please disregard. I'm not seeing anything in the change log that references this change.

Attachments (1)

django_bug.py (3.1 KB ) - added by Paul Landon Tuckett 3 days ago.

Download all attachments as: .zip

Change History (5)

comment:1 by Simon Charette, 4 days ago

Hello Paul.

If you could provide a set of models and a queryset that reproduces it would be easier to bisect the changeset that caused it and determine if it was intended or not. I suspect this was a necessary change to address an issue when combining querysets mixing different field names but it's hard to tell for sure without a way to reproduce what you are experiencing.

by Paul Landon Tuckett, 3 days ago

Attachment: django_bug.py added

comment:2 by Paul Landon Tuckett, 3 days ago

Hello Simon,

Sure, I attached a file that shows a union between 2 querysets of the same model and the sql output. Let me know if that works or you need more information. I tested it both with the upgrade and without it.

comment:3 by Simon Charette, 3 days ago

Resolution: invalid
Status: newclosed

By bisecting the changes since 3.2 I identified 70499b25c708557fb9ee2264686cd172f4b2354e as the change that introduced the aliasing in order to resolve #34123.

The problem it addresses arise when using select_related in queryset that use union as when it's the case and an ambiguous alias is referenced for ordering purposes. For example if you do

base = Book.objects.select_related("author")
base.filter(rating__gt=3).union(base.filter(title__contains="Foo")).order_by("id")

The generated outer ORDER BY clause cannot simply reference "id" as it's ambiguous whether it's book.id or author.id so book.id must be aliased.

It it understood that only ambiguous column names that are referenced by ORDER BY absolutely need to be aliased but it was much easier to systematically alias all columns instead.

So to answer your question it is expected that the SQL changed and since it doesn't change the semantic of the query it is not considered a bug. If you care about the exact name returned from the query you should resort to values instead.

comment:4 by Paul Landon Tuckett, 3 days ago

I totally understand that column references can be ambiguous, but we convert these Django queries into SQL and inject them into raw sql, so it's a major overhaul for us to have to convert all the Django queries to raw sql because of this change. We do use values and/or values_list, but it still doesn't preserve the column names. Even with an order_by and values combination, same result. If there is another way we can force the Django ORM to preserver these column names, please advise.

Version 0, edited 3 days ago by Paul Landon Tuckett (next)
Note: See TracTickets for help on using tickets.
Back to Top