Opened 10 years ago

Closed 10 years ago

Last modified 8 years ago

#10847 closed (fixed)

`QuerySet.values` doesn't remove extra selections.

Reported by: Tai Lee Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: queryset extra select values sql
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no


>>> from django.contrib.auth.models import User
>>> str(User.objects.extra(select={'extra_col': '1'}).values('pk').query)
'SELECT (1) AS "extra_col", "auth_user"."id" FROM "auth_user"'

This can cause problems when you need to do use your queryset with an IN field lookup.

>>> User.objects.filter(pk__in=User.objects.extra(select={'extra_col': '1'}).values('pk').query)
Traceback (most recent call last):
OperationalError: only a single result allowed for a SELECT that is part of an expression

Attachments (3)

values-extra.diff (1.1 KB) - added by Alex Gaynor 10 years ago.
values-extra.2.diff (2.2 KB) - added by Alex Gaynor 10 years ago.
values-extra.3.diff (9.9 KB) - added by Alex Gaynor 10 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 10 years ago by Tai Lee

Keep in mind that any extra selections may also be referenced in order_by or extra where, etc. If we strip extra selections that are not specified in values(), we should also strip them from order_by and extra where, etc., or invalid SQL may be generated.

Changed 10 years ago by Alex Gaynor

Attachment: values-extra.diff added

comment:2 Changed 10 years ago by Alex Gaynor

Component: UncategorizedDatabase layer (models, ORM)
Has patch: set
Triage Stage: UnreviewedAccepted

comment:3 Changed 10 years ago by Russell Keith-Magee

Patch needs improvement: set

Alex - this patch isn't quite right, but the reason requires some explanation.

Here's a more complete test case that reveals the flaw:

>>> TestObject.objects.extra(select={'extra': 1}).values('pk')
[{'pk': 1}]

>>> TestObject.objects.extra(select={'extra': 1}).values('pk').query.as_sql()
('SELECT "extra_regress_testobject"."id" FROM "extra_regress_testobject"', ())

>>> TestObject.objects.filter(pk__in=TestObject.objects.extra(select={'extra': 1}).values('pk'))
[<TestObject: TestObject: first,second,third>]

>>> TestObject.objects.values('pk').extra(select={'extra': 1})
[{'pk': 1}]

>>> TestObject.objects.values('pk').extra(select={'extra': 1}).query.as_sql()
('SELECT "extra_regress_testobject"."id" FROM "extra_regress_testobject"', ())

>>> TestObject.objects.filter(pk__in=TestObject.objects.values('pk').extra(select={'extra': 1}))
[<TestObject: TestObject: first,second,third>]

Look closely - in 1-3, the values() comes after the extra(). These three queries pass fine.

However in 4-6, the extra() comes after the pk(). This is where the strangeness starts: 4 passes (as a result of the changes from #10256), but 5 and 6 don't.

This is due to a problem that I originally thought was just annoying and cosmetic, but this ticket reveals that it is actually significant. The problem is this - the as_sql() call for any query with an extra clause doesn't produce correct SQL until it is executed. This is because the final list of extra columns isn't evaluated until the iterator is evaluated. This has two consequences:

  1. If you try to debug print an SQL query with an extra clause, the output will be misleading
  2. If you use an SQL query with an extra clause as an inner query, you will get errors, because the inner SQL will be wrong.

There is hope, however. The fix lies in changing the way the list of active extra columns are evaluated. Currently, there is a single 'extra' dictionary; when the ValueQuerySet is evaluated, this extra dictionary is trimmed to the right length. This means that if the query isn't evaluated, the list of active extra columns is wrong. Your patch implements the trim in the setup phase, which works fine when the values() comes after the extra(), but not if the extra() comes after the values().

I had a similar problem with aggregates, but I was able to fix that problem by taking a different approach. Rather than having a single dictionary of aggregates, there is a definition dictionary, and a mask of currently active annotations. When you call values(), you set the mask; when you call annotate(), you add fields to the dictionary. This way, at any time you can evaluate a property that contains the current active annotation list by applying the mask to the field definition list.

The masking behaviour for extra() is slightly different to that for annotate() - if an annotate follows the values clause, it is added to the mask; this isn't the case for extra. However, this basic masking approach should make it possible to get an accurate query at any time, without the need for evaluation.

If you want to take a swing at this, it should be a fairly simple change (albiet a fairly widespread one). There is also the potential for some clashes with contrib.gis. If you haven't got the time (or inclination) to work on this, I should be able to take a look at it next week some time.

comment:4 Changed 10 years ago by Tai Lee

Are you sure about tests 1-3 working if extra() comes before values()? When I run test 2-3 (with User as the model) I get failures, with r10581.

>>> from django.contrib.auth.models import User
>>> User.objects.extra(select={'extra': 1}).values('pk').query.as_sql()
(u'SELECT (1) AS "extra", "auth_user"."id" FROM "auth_user"', ())
>>> User.objects.filter(pk__in=User.objects.extra(select={'extra_col': 1}).values('pk'))
Traceback (most recent call last):
OperationalError: only a single result allowed for a SELECT that is part of an expression

The above is with SQLite, but I get the same result with PostgreSQL, just with a different exception (ProgrammingError: subquery has too many columns). I think that test 1 is also executing the extra selection, but just not passing it through to the output.

comment:5 Changed 10 years ago by Tai Lee

Although it just occurred to me that you are probably talking about 1-3 passing with the patch applied (which I did not test). Sorry :)

Changed 10 years ago by Alex Gaynor

Attachment: values-extra.2.diff added

comment:6 Changed 10 years ago by Malcolm Tredinnick

This second patch can't be correct, since calling trim_extra_select() isn't something to do when creating the queryset. A subsequent values() call could reuse those extra values, except they've now been trimmed away, which would break the queryset. As a general rule, don't throw away data from inside Query (in fact, trim_extra_select() is badly implemented for that reason; I haven't gotten around to doing it a different way and this has reminded me to get back to that in the 1.2 timeframe).

comment:7 Changed 10 years ago by Alex Gaynor

Arg, this is correct, I think the right solution is to implement a mask similar to the way that aggregates are handled(stupid extra...).

Changed 10 years ago by Alex Gaynor

Attachment: values-extra.3.diff added

comment:8 Changed 10 years ago by Russell Keith-Magee

Resolution: fixed
Status: newclosed

(In [10648]) Fixed #10847 -- Modified handling of extra() to use a masking strategy, rather than last-minute trimming. Thanks to Tai Lee for the report, and Alex Gaynor for his work on the patch.

This enables querysets with an extra clause to be used in an in filter; as a side effect, it also means that as_sql() now returns the correct result for any query with an extra clause.

comment:9 Changed 8 years ago by Jacob

milestone: 1.1

Milestone 1.1 deleted

Note: See TracTickets for help on using tickets.
Back to Top