Opened 2 months ago
Last modified 2 months ago
#35865 new Cleanup/optimization
Queryset aggregation keeps unnecessary SQL joins
Reported by: | Ruslan | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Problem:
Query.count()
results sub optimal SQL query. Though, the impact might be minimal and highly depend on SQL engine implementation. But cleaner and less controversial SQL is always better
Observed behavior:
When using .count()
method (through QuerySet or Query — doesn't matter) it still keeps all joins, even if they are not required for the query.
Expected behavior:
Calling .count()
(Query.get_count()
) should setup only necessary joins and ignore any other joins that were added due to custom QuerySet.values
or any other method that modifies SELECT.
Some context:
Version: 4.x, 5.x, dev
The code (django/django/db/models/sql/query.py:635):
def get_count(self, using): """ Perform a COUNT() query using the current filter constraints. """ obj = self.clone() return obj.get_aggregation(using, {"__count": Count("*")})["__count"]
I tried to call obj.clear_select_clause()
but it didn't affect any joins, because they are apparently somewhere in Query.alias_map
and I am not yet that familiar with how it actually works. But I will appreciate any hints, even if it is a non-issue for a Django project itself.
Change History (2)
comment:1 by , 2 months ago
Easy pickings: | unset |
---|---|
Summary: | Query.get_count() keeps unnecessary SQL joins → Queryset aggregation keeps unnecessary SQL joins |
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 2 months ago
Cc: | added |
---|
To give you a bit of context here the ORM use to not prune unused annotations before Django 4.2 (#28477) and the lack post annotation pruning left-over
JOIN
pruning was identified as a potential optimization at the time.To give a concrete example say you do
then prior to 59bea9efd2768102fc9d3aedda469502c218e9b7 the generated SQL would have been
and after it is
Now obviously in this case the M:1 join against author is not necessary in this case but it's not always trivial to determine. Take the following example
which results in
Then in this case we can't prune the 1:M join as it's multi-valued (possibly many books for each author) and would return a different value from
len(author_qs)
.The problem then becomes that JOINs can be only be pruned if these two conditions are met
I'm tentatively accepting as this is an already identified desired optimization but it is far from being an easy picking, it's in the realm of close to wont-fix very hard to do correctly.