Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#30771 closed Bug (fixed)

Filtering on query result overrides GROUP BY of internal query

Reported by: Aur Saraf Owned by: James Timmins
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: annotate, annotation, filter, groupby, aggregate, aggregation
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

from django.contrib.auth import models

a = models.User.objects.filter(email__isnull=True).values('email').annotate(m=Max('id')).values('m')
print(a.query) # good
# SELECT MAX("auth_user"."id") AS "m" FROM "auth_user" WHERE "auth_user"."email" IS NULL GROUP BY "auth_user"."email"
print(a[:1].query) # good
# SELECT MAX("auth_user"."id") AS "m" FROM "auth_user" WHERE "auth_user"."email" IS NULL GROUP BY "auth_user"."email"  LIMIT 1
b = models.User.objects.filter(id=a[:1])
print(b.query) # GROUP BY U0."id" should be GROUP BY U0."email"
# SELECT ... FROM "auth_user" WHERE "auth_user"."id" = (SELECT U0."id" FROM "auth_user" U0 WHERE U0."email" IS NULL GROUP BY U0."id"  LIMIT 1)

Change History (12)

comment:1 by Aur Saraf, 5 years ago

Workaround:

from django.contrib.auth import models

a = models.User.objects.filter(email__isnull=True).values('email').aggregate(Max('id'))['id_max']
b = models.User.objects.filter(id=a)

comment:2 by James Timmins, 5 years ago

Owner: changed from nobody to James Timmins
Status: newassigned

comment:3 by Simon Charette, 5 years ago

Thanks for tackling that one James!

If I can provide you some guidance I'd suggest you have a look at lookups.Exact.process_rhs

https://github.com/django/django/blob/ea25bdc2b94466bb1563000bf81628dea4d80612/django/db/models/lookups.py#L265-L267

We probably don't want to perform the clear_select_clause and add_fields(['pk']) when the query is already only selecting a single field.

Version 0, edited 5 years ago by Simon Charette (next)

comment:4 by Mariusz Felisiak, 5 years ago

Triage Stage: UnreviewedAccepted
Version: 2.2master

in reply to:  3 comment:5 by James Timmins, 5 years ago

Thanks so much for the help Simon! This is a great jumping-off point.

There's something that I'm unclear about, which perhaps you can shed some light on. While I was able to replicate the bug with 2.2, when I try to create a test on Master to validate the bug, the group-by behavior seems to have changed.

Here's the test that I created:

    def test_exact_selected_field_rhs_subquery(self):
        author_1 = Author.objects.create(name='one')
        author_2 = Author.objects.create(name='two')
        max_ids = Author.objects.filter(alias__isnull=True).values('alias').annotate(m=Max('id')).values('m')
        authors = Author.objects.filter(id=max_ids[:1])
        
        self.assertFalse(str(max_ids.query)) # This was just to force the test-runner to output the query.
        
        self.assertEqual(authors[0], author_2)

And here's the resulting query:

SELECT MAX("lookup_author"."id") AS "m" FROM "lookup_author" WHERE "lookup_author"."alias" IS NULL GROUP BY "lookup_author"."alias", "lookup_author"."name"

It no longer appears to be grouping by the 'alias' field listed in the initial .values() preceeding the .annotate(). I looked at the docs and release notes to see if there was a behavior change, but didn't see anything listed.

Do you know if I'm just misunderstanding what's happening here? Or does this seem like a possible regression?

comment:6 by Simon Charette, 5 years ago

It's possible that a regression was introduced in between.

Could you try bisecting the commit that changed the behavior

https://docs.djangoproject.com/en/dev/internals/contributing/triaging-tickets/#bisecting-a-regression

in reply to:  6 comment:7 by James Timmins, 5 years ago

Mmm actually disregard that. The second value in the GROUP BY is due to the ordering value in the Author class's Meta class.

class Author(models.Model):
    name = models.CharField(max_length=100)
    alias = models.CharField(max_length=50, null=True, blank=True)

    class Meta:
        ordering = ('name',)

Regarding the bug in question in this ticket, what should the desired behavior be if the inner query is returning multiple fields? With the fix, which allows the inner query to define a field to return/group by, if there are multiple fields used then it will throw a sqlite3.OperationalError: row value misused.

Is this the desired behavior or should it avoid this problem by defaulting back to pk if more than one field is selected?

comment:8 by Simon Charette, 5 years ago

I think that we should only default to pk if no fields are selected. The ORM has preliminary support for multi-column lookups and other interface dealing with subqueries doesn't prevent passing queries with multiple fields so I'd stick to the current __in lookup behavior.

comment:9 by James Timmins, 5 years ago

Has patch: set

The pull request can be found at the following link: https://github.com/django/django/pull/11797

comment:10 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

Resolution: fixed
Status: assignedclosed

In 3697ddb:

[3.0.x] Fixed #30771 -- Fixed exact lookup against queries with selected columns.

Use pre-existing select fields (and thereby GROUP BY fields) from
subquery if they were specified, instead of always defaulting to pk.

Thanks Aur Saraf for the report and Simon Charette for guidance.

Backport of 0719edcd5fed56157ffb3323a8f634aa5e8f9a80 from master

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

In 0719edc:

Fixed #30771 -- Fixed exact lookup against queries with selected columns.

Use pre-existing select fields (and thereby GROUP BY fields) from
subquery if they were specified, instead of always defaulting to pk.

Thanks Aur Saraf for the report and Simon Charette for guidance.

comment:12 by GitHub <noreply@…>, 5 years ago

In 1611906:

[3.0.x] Refs #30771 -- Fixed RemovedInDjango31Warning in test_exact_query_rhs_with_selected_columns.

Note: See TracTickets for help on using tickets.
Back to Top