#28553 closed Bug (fixed)
Querysets: annotate() columns are forced into a certain position which may disrupt union()
Reported by: | David Sanders | Owned by: | David Wobrock |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Ole Laursen, David Wobrock, Simon Charette, Sylvain Fankhauser | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
(Reporting possible issue found by a user on #django)
Using values() to force selection of certain columns in a certain order proved useful unioning querysets with union() for the aforementioned user. The positioning of columns added with annotate() is not controllable with values() and has the potential to disrupt union() unless this fact is known and the ordering done in a certain way to accommodate it.
I'm reporting this mainly for posterity but also as a highlight that perhaps this should be mentioned in the documentation. I'm sure there are reasons why the annotations are appended to the select but if someone feels that this is changeable then it would be a bonus outcome.
Change History (17)
comment:1 by , 7 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 7 years ago
I've stumbled upon a case in production where this limitation prevents me for making a useful query. I've been able to create a test to reproduce this problem.
It works with sqlite but fails with postgres.
Add this test to tests/queries/test_qs_combinators.py
:
from django.db.models import F, IntegerField, TextField, Value def test_union_with_two_annotated_values_on_different_models(self): qs1 = Number.objects.annotate( text_annotation=Value('Foo', TextField()) ).values('text_annotation', 'num') qs2 = ReservedName.objects.annotate( int_annotation=Value(1, IntegerField()), ).values('name', 'int_annotation') self.assertEqual(qs1.union(qs2).count(), 10)
In current master (78f8b80f9b215e50618375adce4c97795dabbb84
), running ./runtests.py --parallel=1 --settings=tests.test_postgres queries.test_qs_combinators.QuerySetSetOperationTests.test_union_with_two_annotated_values_on_different_models
fails:
Testing against Django installed in 'django/django' Creating test database for alias 'default'... Creating test database for alias 'other'... System check identified no issues (1 silenced). E ====================================================================== ERROR: test_union_with_two_annotated_values_on_different_models (queries.test_qs_combinators.QuerySetSetOperationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "django/django/db/backends/utils.py", line 85, in _execute return self.cursor.execute(sql, params) psycopg2.ProgrammingError: UNION types integer and character varying cannot be matched LINE 1: ..._annotation" FROM "queries_number") UNION (SELECT "queries_r... ^ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "django/tests/queries/test_qs_combinators.py", line 140, in test_union_with_two_annotated_values_on_different_models self.assertEqual(qs1.union(qs2).count(), 10) File "django/django/db/models/query.py", line 382, in count return self.query.get_count(using=self.db) File "django/django/db/models/sql/query.py", line 494, in get_count number = obj.get_aggregation(using, ['__count'])['__count'] File "django/django/db/models/sql/query.py", line 479, in get_aggregation result = compiler.execute_sql(SINGLE) File "django/django/db/models/sql/compiler.py", line 1054, in execute_sql cursor.execute(sql, params) File "django/django/db/backends/utils.py", line 68, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "django/django/db/backends/utils.py", line 77, in _execute_with_wrappers return executor(sql, params, many, context) File "django/django/db/backends/utils.py", line 85, in _execute return self.cursor.execute(sql, params) File "django/django/db/utils.py", line 89, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "django/django/db/backends/utils.py", line 85, in _execute return self.cursor.execute(sql, params) django.db.utils.ProgrammingError: UNION types integer and character varying cannot be matched LINE 1: ..._annotation" FROM "queries_number") UNION (SELECT "queries_r... ^ ---------------------------------------------------------------------- Ran 1 test in 0.007s FAILED (errors=1) Destroying test database for alias 'default'... Destroying test database for alias 'other'...
My tests/test_postgres.py
is:
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'django-test', 'HOST': '127.0.0.1', }, 'other': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'django-test-other', } } SECRET_KEY = "django_tests_secret_key" # Use a fast hasher to speed up tests. PASSWORD_HASHERS = [ 'django.contrib.auth.hashers.MD5PasswordHasher', ]
follow-up: 5 comment:4 by , 6 years ago
Cc: | added |
---|---|
Type: | Cleanup/optimization → Bug |
There's a ticket with a test case in #30211.
That ticket was reported as a bug, and I think this ticket should be a bug too, so I'm changing the classification for now (apologies if that's inappropriate). I think it's a bug because the documentation in my reading implies that as long as the columns match, union will work. So it really comes as a bit of a surprise that Django overrides the order in values_list().
In my particular case, I'm using union() to combine two different tables that I need to get out in sorted order, and I was trying to use annotate() + values_list() to add a NULL filler to one table as it lacks a column from the other.
Also, I suppose the ORM could possibly also be a bit more efficient if it could return values_list() tuples directly from the select instead of having to rearrange them?
comment:5 by , 6 years ago
Actually there could be a different problem as well. We were lucky in that the ORM generated SQL where the column types do not match. But what happens when the column types match? I'm afraid you would get a query that doesn't throw an exception but is in fact subtly broken, especially if the values of the differnet columns happen to be similar, in which case it might take a long time for app developers to realize that the query is broken.
comment:6 by , 2 years ago
I just hit this as well. The solution (workaround) seems to be using F()
and Value()
freely and consistently for all querysets even if it doesn't look necessary on the surface.
See https://github.com/matthiask/feincms3-forms/commit/c112a7d613e991780f383393fd05f1c84c81a279
(It's a bit surprising that values_list
doesn't produce a SQL query with the exact same ordering of values, but after thinking about it some more I'm not sure if that's really a bug or just a sharp edge of the current implementation.)
comment:7 by , 21 months ago
I submitted a failing unit test for this issue which demonstrates the problem: https://github.com/django/django/pull/16577
This seems hard to fix. I don't expect to do any work on this in the next few months since I have found a workaround for my use case. Hopefully the test is useful to someone.
comment:8 by , 21 months ago
Cc: | added |
---|---|
Has patch: | set |
Owner: | changed from | to
Status: | new → assigned |
The same issue can probably occur with extra()
fields, but the PR doesn't fix that as we no longer fix bugs for this method.
comment:9 by , 21 months ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
comment:10 by , 20 months ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
Triage Stage: | Accepted → Ready for checkin |
follow-up: 13 comment:12 by , 20 months ago
Cc: | added |
---|
Small note that the cases reported in comment:3 is not fixed, the merged patch only focuses on the case where only annotations are referenced in values
/values_list
which is a subset of the problem reported here as mixing field references, extra, and annotations demonstrate.
The crux of the issue is that SQLCompiler.get_select
ignores Query.values_select
and always generate selected columns as follow
- Start with
Query.extra_select
- Then
Query.select
- End with
Query.annotation_select
The patch here only makes sure that the order of annotation_select
is preserved. What should be done instead is adjust get_select
to be Query.values_select
aware as pointed out on the MR.
comment:13 by , 20 months ago
Replying to Simon Charette:
Small note that the cases reported in comment:3 is not fixed, ...
Yes, but this case is described in #28900. Am I right?
follow-up: 15 comment:14 by , 20 months ago
#28900 seems to be an even more complex case where none of the combined queries use explicit values
but the result of the query combination does.
In the comment:3 example both queries use values
but happen to mix field references and annotations which is not covered by the test included in d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67.
I'm bringing this up because most if not all of the changes made to sql.Query
for change the type of annotation_mask
are unnecessary to solve #28553 entirely.
comment:15 by , 20 months ago
Replying to Simon Charette:
I'm bringing this up because most if not all of the changes made to
sql.Query
for change the type ofannotation_mask
are unnecessary to solve #28553 entirely.
I will be happy to review any adjustments in this area, +1.
comment:16 by , 8 months ago
Cc: | added |
---|
(The ticket component should change to "Documentation" if there aren't any code changes to make here. I'm not sure.)