#28553 closed Bug (fixed)
Querysets: annotate() columns are forced into a certain position which may disrupt union()
| Reported by: | David Sanders | Owned by: | David Wobrock | 
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.11 | 
| Severity: | Normal | Keywords: | |
| Cc: | Ole Laursen, David Wobrock, Simon Charette, Sylvain Fankhauser | Triage Stage: | Ready for checkin | 
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
(Reporting possible issue found by a user on #django)
Using values() to force selection of certain columns in a certain order proved useful unioning querysets with union() for the aforementioned user.  The positioning of columns added with annotate() is not controllable with values() and has the potential to disrupt union() unless this fact is known and the ordering done in a certain way to accommodate it.
I'm reporting this mainly for posterity but also as a highlight that perhaps this should be mentioned in the documentation.  I'm sure there are reasons why the annotations are appended to the select but if someone feels that this is changeable then it would be a bonus outcome.
Change History (17)
comment:1 by , 8 years ago
| Triage Stage: | Unreviewed → Accepted | 
|---|---|
| Type: | Uncategorized → Cleanup/optimization | 
comment:3 by , 8 years ago
I've stumbled upon a case in production where this limitation prevents me for making a useful query. I've been able to create a test to reproduce this problem.
It works with sqlite but fails with postgres.
Add this test to tests/queries/test_qs_combinators.py:
from django.db.models import F, IntegerField, TextField, Value def test_union_with_two_annotated_values_on_different_models(self): qs1 = Number.objects.annotate( text_annotation=Value('Foo', TextField()) ).values('text_annotation', 'num') qs2 = ReservedName.objects.annotate( int_annotation=Value(1, IntegerField()), ).values('name', 'int_annotation') self.assertEqual(qs1.union(qs2).count(), 10)
In current master (78f8b80f9b215e50618375adce4c97795dabbb84), running ./runtests.py --parallel=1 --settings=tests.test_postgres queries.test_qs_combinators.QuerySetSetOperationTests.test_union_with_two_annotated_values_on_different_models fails:
Testing against Django installed in 'django/django'
Creating test database for alias 'default'...
Creating test database for alias 'other'...
System check identified no issues (1 silenced).
E
======================================================================
ERROR: test_union_with_two_annotated_values_on_different_models (queries.test_qs_combinators.QuerySetSetOperationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "django/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.ProgrammingError: UNION types integer and character varying cannot be matched
LINE 1: ..._annotation" FROM "queries_number") UNION (SELECT "queries_r...
                                                             ^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "django/tests/queries/test_qs_combinators.py", line 140, in test_union_with_two_annotated_values_on_different_models
    self.assertEqual(qs1.union(qs2).count(), 10)
  File "django/django/db/models/query.py", line 382, in count
    return self.query.get_count(using=self.db)
  File "django/django/db/models/sql/query.py", line 494, in get_count
    number = obj.get_aggregation(using, ['__count'])['__count']
  File "django/django/db/models/sql/query.py", line 479, in get_aggregation
    result = compiler.execute_sql(SINGLE)
  File "django/django/db/models/sql/compiler.py", line 1054, in execute_sql
    cursor.execute(sql, params)
  File "django/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "django/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "django/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
  File "django/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: UNION types integer and character varying cannot be matched
LINE 1: ..._annotation" FROM "queries_number") UNION (SELECT "queries_r...
                                                             ^
----------------------------------------------------------------------
Ran 1 test in 0.007s
FAILED (errors=1)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...
My tests/test_postgres.py is:
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'django-test',
        'HOST': '127.0.0.1',
    },
    'other': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'django-test-other',
    }
}
SECRET_KEY = "django_tests_secret_key"
# Use a fast hasher to speed up tests.
PASSWORD_HASHERS = [
    'django.contrib.auth.hashers.MD5PasswordHasher',
]
follow-up: 5 comment:4 by , 6 years ago
| Cc: | added | 
|---|---|
| Type: | Cleanup/optimization → Bug | 
There's a ticket with a test case in #30211.
That ticket was reported as a bug, and I think this ticket should be a bug too, so I'm changing the classification for now (apologies if that's inappropriate). I think it's a bug because the documentation in my reading implies that as long as the columns match, union will work. So it really comes as a bit of a surprise that Django overrides the order in values_list().
In my particular case, I'm using union() to combine two different tables that I need to get out in sorted order, and I was trying to use annotate() + values_list() to add a NULL filler to one table as it lacks a column from the other.
Also, I suppose the ORM could possibly also be a bit more efficient if it could return values_list() tuples directly from the select instead of having to rearrange them?
comment:5 by , 6 years ago
Actually there could be a different problem as well. We were lucky in that the ORM generated SQL where the data types of the columns do not match. But what happens when the data types of the columns match? I'm afraid you would get a query that doesn't throw an exception but is in fact subtly broken, especially if the values of the different columns happen to be similar, in which case it might take a long time for app developers to realize that the query is broken.
comment:6 by , 3 years ago
I just hit this as well. The solution (workaround) seems to be using F() and Value() freely and consistently for all querysets even if it doesn't look necessary on the surface.
See https://github.com/matthiask/feincms3-forms/commit/c112a7d613e991780f383393fd05f1c84c81a279
(It's a bit surprising that values_list doesn't produce a SQL query with the exact same ordering of values, but after thinking about it some more I'm not sure if that's really a bug or just a sharp edge of the current implementation.)
comment:7 by , 3 years ago
I submitted a failing unit test for this issue which demonstrates the problem: https://github.com/django/django/pull/16577 
This seems hard to fix. I don't expect to do any work on this in the next few months since I have found a workaround for my use case. Hopefully the test is useful to someone.
comment:8 by , 3 years ago
| Cc: | added | 
|---|---|
| Has patch: | set | 
| Owner: | changed from to | 
| Status: | new → assigned | 
The same issue can probably occur with extra() fields, but the PR doesn't fix that as we no longer fix bugs for this method.
comment:9 by , 3 years ago
| Needs tests: | set | 
|---|---|
| Patch needs improvement: | set | 
comment:10 by , 3 years ago
| Needs tests: | unset | 
|---|---|
| Patch needs improvement: | unset | 
| Triage Stage: | Accepted → Ready for checkin | 
follow-up: 13 comment:12 by , 3 years ago
| Cc: | added | 
|---|
Small note that the cases reported in comment:3 is not fixed, the merged patch only focuses on the case where only annotations are referenced in values/values_list which is a subset of the problem reported here as mixing field references, extra, and annotations demonstrate.
The crux of the issue is that SQLCompiler.get_select ignores Query.values_select and always generate selected columns as follow
- Start with Query.extra_select
- Then Query.select
- End with Query.annotation_select
The patch here only makes sure that the order of annotation_select is preserved. What should be done instead is adjust get_select to be Query.values_select aware as pointed out on the MR.
comment:13 by , 3 years ago
Replying to Simon Charette:
Small note that the cases reported in comment:3 is not fixed, ...
Yes, but this case is described in #28900. Am I right?
follow-up: 15 comment:14 by , 3 years ago
#28900 seems to be an even more complex case where none of the combined queries use explicit values but the result of the query combination does.
In the comment:3 example both queries use values but happen to mix field references and annotations which is not covered by the test included in d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67.
I'm bringing this up because most if not all of the changes made to sql.Query for change the type of annotation_mask are unnecessary to solve #28553 entirely.
comment:15 by , 3 years ago
Replying to Simon Charette:
I'm bringing this up because most if not all of the changes made to
sql.Queryfor change the type ofannotation_maskare unnecessary to solve #28553 entirely.
I will be happy to review any adjustments in this area, +1.
comment:16 by , 19 months ago
| Cc: | added | 
|---|
(The ticket component should change to "Documentation" if there aren't any code changes to make here. I'm not sure.)