Opened 2 years ago
Closed 2 years ago
#33975 closed Cleanup/optimization (fixed)
__in doesn't clear selected fields on the RHS when QuerySet.alias() is used after annotate().
Reported by: | Gabriel Muj | Owned by: | Bhuvnesh |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Alexandr Tatarinov | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Here is a test case to reproduce the bug, you can add this in tests/annotations/tests.py
def test_annotation_and_alias_filter_in_subquery(self): long_books_qs = ( Book.objects.filter( pages__gt=400, ) .annotate(book_annotate=Value(1)) .alias(book_alias=Value(1)) ) publisher_books_qs = ( Publisher.objects.filter( book__in=long_books_qs ) .values("name") ) self.assertCountEqual( publisher_books_qs, [ {'name': 'Apress'}, {'name': 'Sams'}, {'name': 'Prentice Hall'}, {'name': 'Morgan Kaufmann'} ] )
You should get this error:
django.db.utils.OperationalError: sub-select returns 10 columns - expected 1
Change History (14)
comment:1 by , 2 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:2 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
follow-up: 4 comment:3 by , 2 years ago
Using either .alias
or .annotate
works as expected without using .values
to limit to 1 column. Why is that? but using both of them doesn't seem work.
comment:4 by , 2 years ago
Cc: | added |
---|---|
Resolution: | invalid |
Status: | closed → new |
Summary: | Using .filter with lookup field__in=queryset where queryset contains .annotate and .alias fails → __in doesn't clear selected fields on the RHS when QuerySet.alias() is used after annotate(). |
Triage Stage: | Unreviewed → Accepted |
Type: | Bug → Cleanup/optimization |
Replying to Gabriel Muj:
Using either
.alias
or.annotate
works as expected without using.values
to limit to 1 column. Why is that? but using both of them doesn't seem work.
We have some simplification in the __in
lookup that automatically limit selected fields to the pk
when Query
is on the right-side and the selected fields are undefined. It looks like they don't play nice when .annotate()
and .alias()
are mixed. Tentatively accepted for the investigation.
comment:5 by , 2 years ago
Whether or not the In
lookups defaults a Query
right-hand-side to .values('pk')
is based of rhs.has_select_fields
(source)
This dynamic property is based of self.select or self.annotation_select_mask
so I suspect that there might be some bad logic in Query.add_annotation
(source) from the introduction of QuerySet.alias
in f4ac167119e8897c398527c392ed117326496652.
The self.set_annotation_mask(set(self.annotation_select).difference({alias}))
line seems to be what causes the problem here. If annotate
and alias
are used then annotation_select_mask
will be materialized as a mask is required to keep track of which entries in annotations
must be selected and whatnot (not necessary if only using alias
or annotate
as the first doesn't require any selecting and the second selects all annotations)
The add_annotation
logic relied on to implement QuerySet.alias
is not wrong per se it just happens to piggy back on the annotation masking logic that only Query.set_values
relied on previously an now happens to break the heuristics in Query.has_select_fields
.
The proper solutions is likely to replace Query.has_select_fields
by a class attribute that defaults to False
and have set_values
set it to True
and make sure Query.clone
carries the attribute over. This seems like a more durable options as that avoids trying to peek into internals to determine if a values
call was performed.
Gabriel, since you've already written a regression test, would you be interested in submitting a patch that takes the approach proposed above?
comment:6 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 2 years ago
hi, I am new to Django contribution, i tried to implement the solution proposed by simon, replaced the Query.has_select_fields with has_select_fields attribute with default value False and have set_values set it to true, but now i am stuck at carrying over the attribute to Query.clone , can I get a bit more explanation for the same?
Edit:
Replacing Query_has_select_fields with attribute and having set_values set it to true passes tests for annotations but giving 2 Failures on overall tests (queries).Failures are:
FAIL: test_in_subquery (queries.tests.ToFieldTests)
AssertionError: Items in the second set but not the first:
<Eaten: apple at lunch>
FAIL: test_nested_in_subquery (queries.tests.ToFieldTests)
AssertionError: Sequences differ: <QuerySet []> != [<ReportComment: ReportComment object (1)>]
Second sequence contains 1 additional elements.
First extra element 0:
<ReportComment: ReportComment object (1)>
comment:8 by , 2 years ago
If you're still struggling with the clone part of the patch it basically means that Query.clone
must assign obj.has_select_fields = True
when if self.has_select_fields
This should not be necessary as it's a shallow attribute and it already gets carried over.
comment:11 by , 2 years ago
Has patch: | set |
---|
comment:12 by , 2 years ago
Patch needs improvement: | set |
---|
comment:13 by , 2 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
This is a documented and expected behavior.