Opened 11 months ago

Last modified 5 days ago

#35011 assigned Bug

Queryset union can fail depending on the field type and/or order

Reported by: Thierry Bastian Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Simon Charette, Jacob Rief Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello,
I was porting my code to Django 5 and there is an issue with the way queryset are union'ed.
Essentially it ends up trying to convert datetime to int, which is strange. I did not get time to investigate the issue myself.
You can note that the order of fields we use in the .values call affects the behaviour (which is also highly unexpected).

To reproduce the issue, simply untar the attached example, run migrate then test.

Attachments (1)

bugreport.tar.xz (3.3 KB ) - added by Thierry Bastian 11 months ago.
example reproducing the bug

Download all attachments as: .zip

Change History (5)

by Thierry Bastian, 11 months ago

Attachment: bugreport.tar.xz added

example reproducing the bug

comment:1 by Simon Charette, 11 months ago

Cc: Simon Charette added
Owner: changed from nobody to Simon Charette
Status: newassigned
Triage Stage: UnreviewedAccepted

It seems that your provide example has been failing since at least 3.2 so it does not demonstrate a regression in Django 5 but I think it could be tweaked to point at d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 as the culprit.

Running your test on Django 3.2 and 4.2 I get

ValueError: invalid literal for int() with base 10: '2023-12-05 02:46:28.751461'

The fundamental issue here is that the order of specified fields in values has never been a 1:1 match to the order of the SELECT columns but the latter is a necessity to do proper UNION between querysets.

The order is, and has always been, SELECT *extra, *fields, *annotations and within each groups the order of values is respected since #28553 which landed in d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 so I guess this is what broke things on your side during the attempted upgrade but that your tests failed to capture. In this sense I guess this could be considered a release blocker as it changed the ordering from an ambiguous ordering to the other.

I tried solving the fundamental issue but I got in a rabbit hole with regards to the deprecation of extra(fields) which I tried to focus on first. I'd like to find time to get this across the finish line during the holidays so I'll assign to me in the mean time.

This is really what #28553 should have been about in my opinion; making sure the order of values/values_list is always respected even with it contains members that cross groups (extra, fields, annotations).

I'm torn on how to proceed here wrt/ to d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 as while it fixed some queries it broke others as described here.

Until we figure out our plan here Thierry your best bet is to stick entirely to annotations. That means doing something like

  • bugreport/test.py

    diff --git a/bugreport/test.py b/bugreport/test.py
    index 4f66fa2..f6fc98b 100644
    a b  
    11from django.test import TestCase
    2 from django.db.models import Value, IntegerField, DateTimeField
     2from django.db.models import Value, IntegerField, DateTimeField, F
    33from bugreport.models import Leaf, Branch
    44
    55
    66class TreeTest(TestCase):
    77
    88    def test_repro(self):
    9         column_names = ['size', 'created_date']
     9        column_names = ['size', 'created_date_ann']
    1010        #column_names = ['created_date', 'size']  # this one works...
    1111
    1212        leaf = Leaf.objects.create()
    def test_repro(self):  
    1616            branch_qs
    1717            .annotate(
    1818                size=Value(1, output_field=IntegerField()),
    19                 created_date=Value(None, output_field=DateTimeField()),
     19                created_date_ann=Value(None, output_field=DateTimeField()),
    2020            ).values_list(*column_names))
    2121
    2222        leaf_qs = Leaf.objects.all()
    2323        leaf_qs = (
    2424            leaf_qs
    25             .annotate(size=Value(2, output_field=IntegerField()))
     25            .annotate(size=Value(2, output_field=IntegerField()), created_date_ann=F("created_date"))
    2626            .values_list(*column_names)
    2727        )
    2828        print("this works", list(leaf_qs))

Which will have all the values reference in the same SELECT group and due to d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 the order will be respected.

Last edited 11 months ago by Simon Charette (previous) (diff)

comment:2 by Thierry Bastian, 11 months ago

Thank you for your detailed answer. Your explanation makes perfect sense. As you can imaging, I was trying to reproduce with the smallest possible example. It might be that some detail made it work by chance. In the original code, I am using .values, so the order of fields does not matter that much and I reorganised them I the order *fields, *annotations.
That will probably work for me. I do hope that I don't have other places where this fails silently ;).

thanks again.

comment:3 by Jacob Rief, 5 days ago

Cc: Jacob Rief added
Has patch: set

PR to prove that this has been fixed: https://github.com/django/django/pull/18743

comment:4 by Simon Charette, 5 days ago

This was a subset of the general problems solved by 65ad4ade74dc9208b9d686a451cd6045df0c9c3a for #28900 that d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67 only partially resolved.

Last edited 5 days ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top