Opened 11 months ago
Last modified 5 days ago
#35011 assigned Bug
Queryset union can fail depending on the field type and/or order
Reported by: | Thierry Bastian | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Jacob Rief | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello,
I was porting my code to Django 5 and there is an issue with the way queryset are union'ed.
Essentially it ends up trying to convert datetime to int, which is strange. I did not get time to investigate the issue myself.
You can note that the order of fields we use in the .values call affects the behaviour (which is also highly unexpected).
To reproduce the issue, simply untar the attached example, run migrate then test.
Attachments (1)
Change History (5)
by , 11 months ago
Attachment: | bugreport.tar.xz added |
---|
comment:1 by , 11 months ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Triage Stage: | Unreviewed → Accepted |
It seems that your provide example has been failing since at least 3.2 so it does not demonstrate a regression in Django 5 but I think it could be tweaked to point at d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 as the culprit.
Running your test on Django 3.2 and 4.2 I get
ValueError: invalid literal for int() with base 10: '2023-12-05 02:46:28.751461'
The fundamental issue here is that the order of specified fields in values
has never been a 1:1 match to the order of the SELECT
columns but the latter is a necessity to do proper UNION
between querysets.
The order is, and has always been, SELECT *extra, *fields, *annotations
and within each groups the order of values
is respected since #28553 which landed in d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 so I guess this is what broke things on your side during the attempted upgrade but that your tests failed to capture. In this sense I guess this could be considered a release blocker as it changed the ordering from an ambiguous ordering to the other.
I tried solving the fundamental issue but I got in a rabbit hole with regards to the deprecation of extra(fields)
which I tried to focus on first. I'd like to find time to get this across the finish line during the holidays so I'll assign to me in the mean time.
This is really what #28553 should have been about in my opinion; making sure the order of values/values_list
is always respected even with it contains members that cross groups (extra
, fields
, annotations
).
I'm torn on how to proceed here wrt/ to d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 as while it fixed some queries it broke others as described here.
Until we figure out our plan here Thierry your best bet is to stick entirely to annotations. That means doing something like
-
bugreport/test.py
diff --git a/bugreport/test.py b/bugreport/test.py index 4f66fa2..f6fc98b 100644
a b 1 1 from django.test import TestCase 2 from django.db.models import Value, IntegerField, DateTimeField 2 from django.db.models import Value, IntegerField, DateTimeField, F 3 3 from bugreport.models import Leaf, Branch 4 4 5 5 6 6 class TreeTest(TestCase): 7 7 8 8 def test_repro(self): 9 column_names = ['size', 'created_date ']9 column_names = ['size', 'created_date_ann'] 10 10 #column_names = ['created_date', 'size'] # this one works... 11 11 12 12 leaf = Leaf.objects.create() … … def test_repro(self): 16 16 branch_qs 17 17 .annotate( 18 18 size=Value(1, output_field=IntegerField()), 19 created_date =Value(None, output_field=DateTimeField()),19 created_date_ann=Value(None, output_field=DateTimeField()), 20 20 ).values_list(*column_names)) 21 21 22 22 leaf_qs = Leaf.objects.all() 23 23 leaf_qs = ( 24 24 leaf_qs 25 .annotate(size=Value(2, output_field=IntegerField()) )25 .annotate(size=Value(2, output_field=IntegerField()), created_date_ann=F("created_date")) 26 26 .values_list(*column_names) 27 27 ) 28 28 print("this works", list(leaf_qs))
Which will have all the values
reference in the same SELECT
group and due to d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 the order will be respected.
comment:2 by , 11 months ago
Thank you for your detailed answer. Your explanation makes perfect sense. As you can imaging, I was trying to reproduce with the smallest possible example. It might be that some detail made it work by chance. In the original code, I am using .values, so the order of fields does not matter that much and I reorganised them I the order *fields, *annotations.
That will probably work for me. I do hope that I don't have other places where this fails silently ;).
thanks again.
comment:3 by , 5 days ago
Cc: | added |
---|---|
Has patch: | set |
PR to prove that this has been fixed: https://github.com/django/django/pull/18743
comment:4 by , 5 days ago
This was a subset of the general problems solved by 65ad4ade74dc9208b9d686a451cd6045df0c9c3a for #28900 that d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 only partially resolved.
example reproducing the bug