#10847 closed (fixed)
`QuerySet.values` doesn't remove extra selections.
Reported by: | Tai Lee | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | queryset extra select values sql | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
>>> from django.contrib.auth.models import User >>> str(User.objects.extra(select={'extra_col': '1'}).values('pk').query) 'SELECT (1) AS "extra_col", "auth_user"."id" FROM "auth_user"'
This can cause problems when you need to do use your queryset with an IN field lookup.
>>> User.objects.filter(pk__in=User.objects.extra(select={'extra_col': '1'}).values('pk').query) Traceback (most recent call last): ... OperationalError: only a single result allowed for a SELECT that is part of an expression
Attachments (3)
Change History (12)
comment:1 by , 16 years ago
by , 16 years ago
Attachment: | values-extra.diff added |
---|
comment:2 by , 16 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Has patch: | set |
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 16 years ago
Patch needs improvement: | set |
---|
Alex - this patch isn't quite right, but the reason requires some explanation.
Here's a more complete test case that reveals the flaw:
#1 >>> TestObject.objects.extra(select={'extra': 1}).values('pk') [{'pk': 1}] #2 >>> TestObject.objects.extra(select={'extra': 1}).values('pk').query.as_sql() ('SELECT "extra_regress_testobject"."id" FROM "extra_regress_testobject"', ()) #3 >>> TestObject.objects.filter(pk__in=TestObject.objects.extra(select={'extra': 1}).values('pk')) [<TestObject: TestObject: first,second,third>] #4 >>> TestObject.objects.values('pk').extra(select={'extra': 1}) [{'pk': 1}] #5 >>> TestObject.objects.values('pk').extra(select={'extra': 1}).query.as_sql() ('SELECT "extra_regress_testobject"."id" FROM "extra_regress_testobject"', ()) #6 >>> TestObject.objects.filter(pk__in=TestObject.objects.values('pk').extra(select={'extra': 1})) [<TestObject: TestObject: first,second,third>]
Look closely - in 1-3, the values() comes after the extra(). These three queries pass fine.
However in 4-6, the extra() comes after the pk(). This is where the strangeness starts: 4 passes (as a result of the changes from #10256), but 5 and 6 don't.
This is due to a problem that I originally thought was just annoying and cosmetic, but this ticket reveals that it is actually significant. The problem is this - the as_sql() call for any query with an extra clause doesn't produce correct SQL until it is executed. This is because the final list of extra columns isn't evaluated until the iterator is evaluated. This has two consequences:
- If you try to debug print an SQL query with an extra clause, the output will be misleading
- If you use an SQL query with an extra clause as an inner query, you will get errors, because the inner SQL will be wrong.
There is hope, however. The fix lies in changing the way the list of active extra columns are evaluated. Currently, there is a single 'extra' dictionary; when the ValueQuerySet is evaluated, this extra dictionary is trimmed to the right length. This means that if the query isn't evaluated, the list of active extra columns is wrong. Your patch implements the trim in the setup phase, which works fine when the values() comes after the extra(), but not if the extra() comes after the values().
I had a similar problem with aggregates, but I was able to fix that problem by taking a different approach. Rather than having a single dictionary of aggregates, there is a definition dictionary, and a mask of currently active annotations. When you call values(), you set the mask; when you call annotate(), you add fields to the dictionary. This way, at any time you can evaluate a property that contains the current active annotation list by applying the mask to the field definition list.
The masking behaviour for extra() is slightly different to that for annotate() - if an annotate follows the values clause, it is added to the mask; this isn't the case for extra. However, this basic masking approach should make it possible to get an accurate query at any time, without the need for evaluation.
If you want to take a swing at this, it should be a fairly simple change (albiet a fairly widespread one). There is also the potential for some clashes with contrib.gis. If you haven't got the time (or inclination) to work on this, I should be able to take a look at it next week some time.
comment:4 by , 16 years ago
Are you sure about tests 1-3 working if extra()
comes before values()
? When I run test 2-3 (with User as the model) I get failures, with r10581.
>>> from django.contrib.auth.models import User >>> User.objects.extra(select={'extra': 1}).values('pk').query.as_sql() (u'SELECT (1) AS "extra", "auth_user"."id" FROM "auth_user"', ()) >>> User.objects.filter(pk__in=User.objects.extra(select={'extra_col': 1}).values('pk')) Traceback (most recent call last): ... OperationalError: only a single result allowed for a SELECT that is part of an expression
The above is with SQLite, but I get the same result with PostgreSQL, just with a different exception (ProgrammingError: subquery has too many columns
). I think that test 1 is also executing the extra selection, but just not passing it through to the output.
comment:5 by , 16 years ago
Although it just occurred to me that you are probably talking about 1-3 passing with the patch applied (which I did not test). Sorry :)
by , 16 years ago
Attachment: | values-extra.2.diff added |
---|
comment:6 by , 16 years ago
This second patch can't be correct, since calling trim_extra_select()
isn't something to do when creating the queryset. A subsequent values()
call could reuse those extra values, except they've now been trimmed away, which would break the queryset. As a general rule, don't throw away data from inside Query
(in fact, trim_extra_select()
is badly implemented for that reason; I haven't gotten around to doing it a different way and this has reminded me to get back to that in the 1.2 timeframe).
comment:7 by , 16 years ago
Arg, this is correct, I think the right solution is to implement a mask similar to the way that aggregates are handled(stupid extra...).
by , 16 years ago
Attachment: | values-extra.3.diff added |
---|
comment:8 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [10648]) Fixed #10847 -- Modified handling of extra() to use a masking strategy, rather than last-minute trimming. Thanks to Tai Lee for the report, and Alex Gaynor for his work on the patch.
This enables querysets with an extra clause to be used in an in filter; as a side effect, it also means that as_sql() now returns the correct result for any query with an extra clause.
Keep in mind that any extra selections may also be referenced in
order_by
or extrawhere
, etc. If we strip extra selections that are not specified invalues()
, we should also strip them fromorder_by
and extrawhere
, etc., or invalid SQL may be generated.