Context Navigation

← Previous Ticket
Next Ticket →

#33655 closed Cleanup/optimization (fixed)

Unnecessary column in a GROUP BY clause with QuerySet.exists()

Reported by:	Marc Perrin	Owned by:	Marc Perrin
Component:	Database layer (models, ORM)	Version:	4.0
Severity:	Normal	Keywords:	exists group by
Cc:	Simon Charette	Triage Stage:	Ready for checkin
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

I've got a question about the interaction between exists() and group by.

For example something like:

Manager.values('field').annotate(cnt=Count('id')).filter(cnt__gt=1).exists()

The corresp. query with PostgreSQL looks like this:

SELECT (1) AS "a"
FROM "app_model"
GROUP BY "app_model"."field", (1)
HAVING COUNT("app_model"."id") > 1
LIMIT 1

exists() (see https://github.com/django/django/blob/470708f50d8c13a50770893b8d7181f5218bf3ac/django/db/models/sql/query.py#L563) clears the SELECT clause and replaces it by (if I understand correctly) a hardcoded value 1 (as "a"), along with a limit of 1, which makes sense to me.

But get_group_by() (see https://github.com/django/django/blob/6b53114dd862ec97c282fdfdc83579cbd6d1560d/django/db/models/sql/compiler.py#L79) pushes this hardcoded value to the GROUP BY clause and we end up with the query above.

Now, on PostgreSQL, that works, but to me it sounds like it works by luck/lucky robustness... and certainly the same query without the , (1) in the GROUP BY clause yields the same result.
The problem is that outside of PostgreSQL that GROUP BY clause can be downright invalid...
Note that if I alter exists() to use {'a': 2} instead of {'a': 1}, it does not work anymore (on PostgreSQL), because (2) in the SELECT clause means the hardcoded number 2 while (2) in the GROUP BY clause presumably (??) refers to the 2nd expr of the SELECT clause, but we only got one...

My feeling is that get_group_by() should avoid adding extra/raw sql (or at least extra/raw pure values, if we can detect that...) to the group by expressions?

NB: the existing/old ticket that seems most related to my question would probably be: https://code.djangoproject.com/ticket/24835

Change History (15)

follow-up: 2 comment:1 by Mariusz Felisiak, 3 years ago

Cc:	Simon Charette added
Summary:	Interaction between exists() and group by → Unnecessary column in a GROUP BY clause with QuerySet.exists()
Triage Stage:	Unreviewed → Accepted
Type:	Uncategorized → Cleanup/optimization

Thanks for the report. I agree that (1) is unnecessary in the GROUP BY clause unfortunately we cannot always remove constants from it because folks can use them as an alias for n-th column (as you already noticed). Maybe we could select only the first column from the GROUP BY clause instead:

django/db/models/sql/query.py

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index 64e7927f7a..940d3141d1 100644

                class Query(BaseExpression):
         q.clear_ordering(force=True)
         if limit:
             q.set_limits(high=1)
+        q.add_extra({"a": 1}, None, None, None, None, None)
+        q.set_extra_mask(["a"])
+        if q.group_by and not q.select:
+            q.add_select_col(q.group_by[0], "a")
+        else:
+            q.add_extra({"a": 1}, None , None, None, None, None)
+            q.set_extra_mask(["a"])
         return q
     def has_results(self, using):

Tentatively accepted for future investigation.

in reply to: 1 comment:2 by Marc Perrin, 3 years ago

Thanks for the quick answer.

Your suggested workaround seems interesting indeed (provided the value of said column does not affect the exists() return value - for example if it's 0?)

NB: I didn't mean removing all constants from GROUP BY, rather: removing the automatic transmission of constants from the SELECT clause to the GROUP BY clause, since their meaning automatically switches from "constant number n" to "nth element of the select clause" which imo can never be intended.

comment:3 by Mariusz Felisiak, 3 years ago

provided the value of said column does not affect the exists() return value - for example if it's 0?

No, it's not. It's enough for the query to return any rows.

follow-up: 5 comment:4 by Simon Charette, 3 years ago

Maybe we should stop relying on extra and use Value(1) instead which already disables grouping.

django/db/models/sql/query.py

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index 64e7927f7a..1097de4f79 100644

-              a
     OuterRef,
     Ref,
     ResolvedOuterRef,
+    Value,
+)
 from django.db.models.fields import Field
 from django.db.models.fields.related_lookups import MultiColSource
-…
+               def exists(self, using, limit=True):
         q.clear_ordering(force=True)
         if limit:
             q.set_limits(high=1)
+        q.add_extra({"a": 1}, None, None, None, None, None)
+        q.set_extra_mask(["a"])
+        q.add_annotation(Value(1), "a")
         return q
     def has_results(self, using):

Anything that relies on extra is a black box to the compiler as it could be an aggregate function or not.

in reply to: 4 ; follow-up: 6 comment:5 by Mariusz Felisiak, 3 years ago

Replying to Simon Charette:

Maybe we should stop relying on extra and use Value(1) instead which already disables grouping.

Great idea!

in reply to: 5 comment:6 by Marc Perrin, 3 years ago

Sounds good indeed!

Replying to Mariusz Felisiak:

Replying to Simon Charette:

Maybe we should stop relying on extra and use Value(1) instead which already disables grouping.

Great idea!

comment:7 by Mariusz Felisiak, 3 years ago

Marc, Would you like to prepare a patch?

follow-up: 10 comment:8 by Marc Perrin, 3 years ago

Yeah, I guess I can do that, with Simon's suggestion the whole thing should be straightforward enough.
I'll take the ticket - I'll have to look up the contribution guide etc. / especially about the tests, as I haven't really contributed before.

comment:9 by Marc Perrin, 3 years ago

Owner:	changed from nobody to Marc Perrin
Status:	new → assigned

in reply to: 8 comment:10 by Mariusz Felisiak, 3 years ago

Replying to Marc Perrin:

Yeah, I guess I can do that, with Simon's suggestion the whole thing should be straightforward enough.
I'll take the ticket - I'll have to look up the contribution guide etc. / especially about the tests, as I haven't really contributed before.

Great, thanks! In this ticket we are forced to check a generated SQL, IMO it's fine to add assertions to an existing test e.g. aggregation.tests.AggregateTestCase.test_aggregation_subquery_annotation_exists:

        with self.assertNumQueries(1) as ctx:
            self.assertTrue(publisher_qs.exists())
        sql = ctx[0]["sql"]
        self.assertIn("SELECT 1 ", sql)
        self.assertNotIn("(1)", sql)
        # There is just one GROUP BY clause (zero commas means at most one
        # clause).
        self.assertEqual(sql[sql.index("GROUP BY") :].count(", "), 0)

comment:11 by Marc Perrin, 3 years ago

Yeah, looks like the most relevant existing test case indeed, thanks for the pointer

comment:12 by Marc Perrin, 3 years ago

Though after debugging this test, it might be a bit more complex than we want/need (e.g. there are 4 fields in the GROUP BY anyway...) and I might just add a new test e.g. "test_aggregation_annotation_exists" that gets close to what I mentioned originally (e.g.

Book.objects.values('publisher').annotate(cnt=Count('isbn')).filter(cnt__gt=1).exists()

)

For reference, with the fix, this query goes from:

SELECT (1) AS "a" FROM "aggregation_book" GROUP BY "aggregation_book"."publisher_id", (1) HAVING COUNT("aggregation_book"."isbn") > 1 LIMIT 1

to:

SELECT 1 AS "a" FROM "aggregation_book" GROUP BY "aggregation_book"."publisher_id" HAVING COUNT("aggregation_book"."isbn") > 1 LIMIT 1

(with the test default settings i.e. sqlite I guess)
I'll try to be diligent with the asserts though, because if the test must work on all backends, even

self.assertIn("SELECT 1 ", sql)

isn't good, because we can get SELECT TOP 1 1 (...) instead of SELECT 1 (...) LIMIT 1

So perhaps a simple

self.assertNotIn(", (1)", sql)

would be enough (I'm not really fond of your assert about GROUP BY either, because it's not really targeting the GROUP BY clause, it includes everything after it incl. HAVING clause)

Last edited 3 years ago by Marc Perrin (previous) (diff)

comment:13 by Marc Perrin, 3 years ago

But anyway I've started the PR: https://github.com/django/django/pull/15630 se we can discuss the details in there

comment:14 by Mariusz Felisiak, 3 years ago

Has patch:	set
Triage Stage:	Accepted → Ready for checkin

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution:	→ fixed
Status:	assigned → closed

In 4282fd4:

Fixed #33655 -- Removed unnecessary constant from GROUP BY clause for QuerySet.exists().

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#33655 closed Cleanup/optimization (fixed)

Unnecessary column in a GROUP BY clause with QuerySet.exists()

Description

Change History (15)

follow-up: 2 comment:1 by Mariusz Felisiak, 3 years ago

django/db/models/sql/query.py

in reply to: 1 comment:2 by Marc Perrin, 3 years ago

comment:3 by Mariusz Felisiak, 3 years ago

follow-up: 5 comment:4 by Simon Charette, 3 years ago

django/db/models/sql/query.py

in reply to: 4 ; follow-up: 6 comment:5 by Mariusz Felisiak, 3 years ago

in reply to: 5 comment:6 by Marc Perrin, 3 years ago

comment:7 by Mariusz Felisiak, 3 years ago

follow-up: 10 comment:8 by Marc Perrin, 3 years ago

comment:9 by Marc Perrin, 3 years ago

in reply to: 8 comment:10 by Mariusz Felisiak, 3 years ago

comment:11 by Marc Perrin, 3 years ago

comment:12 by Marc Perrin, 3 years ago

comment:13 by Marc Perrin, 3 years ago

comment:14 by Mariusz Felisiak, 3 years ago

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us