Opened 4 months ago
Last modified 7 weeks ago
#35865 new Cleanup/optimization
Queryset aggregation keeps unnecessary SQL joins
Reported by: | Ruslan | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Stephen | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Pull Requests: | How to create a pull request | ||
Description ¶
Problem:
Query.count()
results sub optimal SQL query. Though, the impact might be minimal and highly depend on SQL engine implementation. But cleaner and less controversial SQL is always better
Observed behavior:
When using .count()
method (through QuerySet or Query — doesn't matter) it still keeps all joins, even if they are not required for the query.
Expected behavior:
Calling .count()
(Query.get_count()
) should setup only necessary joins and ignore any other joins that were added due to custom QuerySet.values
or any other method that modifies SELECT.
Some context:
Version: 4.x, 5.x, dev
The code (django/django/db/models/sql/query.py:635):
def get_count(self, using): """ Perform a COUNT() query using the current filter constraints. """ obj = self.clone() return obj.get_aggregation(using, {"__count": Count("*")})["__count"]
I tried to call obj.clear_select_clause()
but it didn't affect any joins, because they are apparently somewhere in Query.alias_map
and I am not yet that familiar with how it actually works. But I will appreciate any hints, even if it is a non-issue for a Django project itself.
According to the ticket's flags, the next step(s) to move this issue forward are:
- To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is:
[https://github.com/django/django/pull/#### PR]
.
Change History (3)
comment:1 by , 4 months ago
Easy pickings: | unset |
---|---|
Summary: | Query.get_count() keeps unnecessary SQL joins → Queryset aggregation keeps unnecessary SQL joins |
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 4 months ago
Cc: | added |
---|
comment:3 by , 7 weeks ago
Cc: | added |
---|
To give you a bit of context here the ORM use to not prune unused annotations before Django 4.2 (#28477) and the lack post annotation pruning left-over
JOIN
pruning was identified as a potential optimization at the time.To give a concrete example say you do
then prior to 59bea9efd2768102fc9d3aedda469502c218e9b7 the generated SQL would have been
and after it is
Now obviously in this case the M:1 join against author is not necessary in this case but it's not always trivial to determine. Take the following example
which results in
Then in this case we can't prune the 1:M join as it's multi-valued (possibly many books for each author) and would return a different value from
len(author_qs)
.The problem then becomes that JOINs can be only be pruned if these two conditions are met
I'm tentatively accepting as this is an already identified desired optimization but it is far from being an easy picking, it's in the realm of close to wont-fix very hard to do correctly.