#33975 closed Cleanup/optimization (fixed)

__in doesn't clear selected fields on the RHS when QuerySet.alias() is used after annotate().

Reported by: Gabriel Muj Owned by: Bhuvnesh
Component: Database layer (models, ORM) Version: 4.0
Severity: Normal Keywords:
Cc: Simon Charette, Alexandr Tatarinov Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Here is a test case to reproduce the bug, you can add this in tests/annotations/tests.py

    def test_annotation_and_alias_filter_in_subquery(self):
        long_books_qs = (
            Book.objects.filter(
                pages__gt=400,
            )
            .annotate(book_annotate=Value(1))
            .alias(book_alias=Value(1))
        )
        publisher_books_qs = (
            Publisher.objects.filter(
                book__in=long_books_qs
            )
            .values("name")
        )
        self.assertCountEqual(
            publisher_books_qs,
            [
                {'name': 'Apress'},
                {'name': 'Sams'},
                {'name': 'Prentice Hall'},
                {'name': 'Morgan Kaufmann'}
            ]
        )

You should get this error:

django.db.utils.OperationalError: sub-select returns 10 columns - expected 1

Change History (14)

comment:1 by Gabriel Muj, 21 months ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Mariusz Felisiak, 21 months ago

Resolution: invalid
Status: newclosed

This is a documented and expected behavior.

comment:3 by Gabriel Muj, 21 months ago

Using either .alias or .annotate works as expected without using .values to limit to 1 column. Why is that? but using both of them doesn't seem work.

in reply to:  3 comment:4 by Mariusz Felisiak, 21 months ago

Cc: Simon Charette Alexandr Tatarinov added
Resolution: invalid
Status: closednew
Summary: Using .filter with lookup field__in=queryset where queryset contains .annotate and .alias fails__in doesn't clear selected fields on the RHS when QuerySet.alias() is used after annotate().
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

Replying to Gabriel Muj:

Using either .alias or .annotate works as expected without using .values to limit to 1 column. Why is that? but using both of them doesn't seem work.

We have some simplification in the __in lookup that automatically limit selected fields to the pk when Query is on the right-side and the selected fields are undefined. It looks like they don't play nice when .annotate() and .alias() are mixed. Tentatively accepted for the investigation.

comment:5 by Simon Charette, 21 months ago

Whether or not the In lookups defaults a Query right-hand-side to .values('pk') is based of rhs.has_select_fields (source)

This dynamic property is based of self.select or self.annotation_select_mask so I suspect that there might be some bad logic in Query.add_annotation (source) from the introduction of QuerySet.alias in f4ac167119e8897c398527c392ed117326496652.

The self.set_annotation_mask(set(self.annotation_select).difference({alias})) line seems to be what causes the problem here. If annotate and alias are used then annotation_select_mask will be materialized as a mask is required to keep track of which entries in annotations must be selected and whatnot (not necessary if only using alias or annotate as the first doesn't require any selecting and the second selects all annotations)

The add_annotation logic relied on to implement QuerySet.alias is not wrong per se it just happens to piggy back on the annotation masking logic that only Query.set_values relied on previously an now happens to break the heuristics in Query.has_select_fields.

The proper solutions is likely to replace Query.has_select_fields by a class attribute that defaults to False and have set_values set it to True and make sure Query.clone carries the attribute over. This seems like a more durable options as that avoids trying to peek into internals to determine if a values call was performed.

Gabriel, since you've already written a regression test, would you be interested in submitting a patch that takes the approach proposed above?

Last edited 21 months ago by Simon Charette (previous) (diff)

comment:6 by Bhuvnesh, 21 months ago

Owner: changed from nobody to Bhuvnesh
Status: newassigned

comment:7 by Bhuvnesh, 21 months ago

hi, I am new to Django contribution, i tried to implement the solution proposed by simon, replaced the Query.has_select_fields with has_select_fields attribute with default value False and have set_values set it to true, but now i am stuck at carrying over the attribute to Query.clone , can I get a bit more explanation for the same?

Edit:

Replacing Query_has_select_fields with attribute and having set_values set it to true passes tests for annotations but giving 2 Failures on overall tests (queries).Failures are:

FAIL: test_in_subquery (queries.tests.ToFieldTests)

AssertionError: Items in the second set but not the first:
<Eaten: apple at lunch>

FAIL: test_nested_in_subquery (queries.tests.ToFieldTests)

AssertionError: Sequences differ: <QuerySet []> != [<ReportComment: ReportComment object (1)>]
Second sequence contains 1 additional elements.
First extra element 0:
<ReportComment: ReportComment object (1)>

Last edited 21 months ago by Bhuvnesh (previous) (diff)

comment:8 by Simon Charette, 21 months ago

If you're still struggling with the clone part of the patch it basically means that Query.clone must assign obj.has_select_fields = True when if self.has_select_fields

This should not be necessary as it's a shallow attribute and it already gets carried over.

The tests are failing because RelatedIn also needs adjustments to make sure it goes through Query.set_values instead of calling clear_select_clause and add_fields

  • django/db/models/fields/related_lookups.py

    diff --git a/django/db/models/fields/related_lookups.py b/django/db/models/fields/related_lookups.py
    index 1a845a1f7f..afea09b5a9 100644
    a b def get_prep_lookup(self):  
    9393            elif not getattr(self.rhs, "has_select_fields", True) and not getattr(
    9494                self.lhs.field.target_field, "primary_key", False
    9595            ):
    96                 self.rhs.clear_select_clause()
    9796                if (
    9897                    getattr(self.lhs.output_field, "primary_key", False)
    9998                    and self.lhs.output_field.model == self.rhs.model
    def get_prep_lookup(self):  
    105104                    target_field = self.lhs.field.name
    106105                else:
    107106                    target_field = self.lhs.field.target_field.name
    108                 self.rhs.add_fields([target_field], True)
     107                self.rhs.set_values([target_field])
    109108        return super().get_prep_lookup()
    110109
    111110    def as_sql(self, compiler, connection):
Last edited 21 months ago by Simon Charette (previous) (diff)

comment:9 by Bhuvnesh, 21 months ago

Thanks, that worked! Opening a PR.

comment:11 by Bhuvnesh, 21 months ago

Has patch: set

comment:12 by Mariusz Felisiak, 21 months ago

Patch needs improvement: set

comment:13 by Mariusz Felisiak, 21 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 21 months ago

Resolution: fixed
Status: assignedclosed

In 32797e7f:

Fixed #33975 -- Fixed in lookup when rhs is a queryset with annotate() and alias().

This fixes clearing selected fields.

Note: See TracTickets for help on using tickets.
Back to Top