Context Navigation

← Previous Ticket
Next Ticket →

#35865 new Cleanup/optimization

Queryset aggregation keeps unnecessary SQL joins

Reported by:	Ruslan	Owned by:
Component:	Database layer (models, ORM)	Version:
Severity:	Normal	Keywords:
Cc:	Simon Charette, Stephen	Triage Stage:	Accepted
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description ¶

Problem:
Query.count() results sub optimal SQL query. Though, the impact might be minimal and highly depend on SQL engine implementation. But cleaner and less controversial SQL is always better

Observed behavior:
When using .count() method (through QuerySet or Query — doesn't matter) it still keeps all joins, even if they are not required for the query.

Expected behavior:
Calling .count() (Query.get_count()) should setup only necessary joins and ignore any other joins that were added due to custom QuerySet.values or any other method that modifies SELECT.

Some context:
Version: 4.x, 5.x, dev
The code (django/django/db/models/sql/query.py:635):

def get_count(self, using):
    """
    Perform a COUNT() query using the current filter constraints.
    """
    obj = self.clone()
    return obj.get_aggregation(using, {"__count": Count("*")})["__count"]

I tried to call obj.clear_select_clause() but it didn't affect any joins, because they are apparently somewhere in Query.alias_map and I am not yet that familiar with how it actually works. But I will appreciate any hints, even if it is a non-issue for a Django project itself.

According to the ticket's flags, the next step(s) to move this issue forward are:

To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is: [https://github.com/django/django/pull/#### PR].

Change History (5)

comment:1 by Simon Charette, 5 months ago

Easy pickings:	unset
Summary:	Query.get_count() keeps unnecessary SQL joins → Queryset aggregation keeps unnecessary SQL joins
Triage Stage:	Unreviewed → Accepted

To give you a bit of context here the ORM use to not prune unused annotations before Django 4.2 (#28477) and the lack post annotation pruning left-over JOIN pruning was identified as a potential optimization at the time.

To give a concrete example say you do

Book.objects.annotate(
    author_name=Concat("author__first_name", V(" "), "author_last_name"),
).count()

then prior to 59bea9efd2768102fc9d3aedda469502c218e9b7 the generated SQL would have been

SELECT COUNT(*) FROM (
    SELECT book.id, (author.first_name || ' ' || author.last_name) author_name
    FROM book
    LEFT JOIN author ON (book.author_id = author.id)
)

and after it is

SELECT COUNT(*)
FROM book
LEFT JOIN author ON (book.author_id = author.id)

Now obviously in this case the M:1 join against author is not necessary in this case but it's not always trivial to determine. Take the following example

author_qs = Author.objects.annotate(
    book_title=F("books__title")
)
author_qs.count()

which results in

SELECT COUNT(*)
FROM author
LEFT JOIN book ON (book.author_id = author.id)

Then in this case we can't prune the 1:M join as it's multi-valued (possibly many books for each author) and would return a different value from len(author_qs).

The problem then becomes that JOINs can be only be pruned if these two conditions are met

They are not referenced anymore (could be done by decrementing reference counts on annotation pruning)
They are not involved in multi-valued relationships (AKA many-to-many or reverse many-to-one)

I'm tentatively accepting as this is an already identified desired optimization but it is far from being an easy picking, it's in the realm of close to wont-fix very hard to do correctly.

comment:2 by Simon Charette, 5 months ago

Cc:	Simon Charette added

comment:3 by Stephen, 3 months ago

Cc:	Stephen added

comment:4 by Ahmed Nassar, 3 weeks ago

Owner:	set to Ahmed Nassar
Status:	new → assigned

comment:5 by Ahmed Nassar, 38 hours ago

Owner:	Ahmed Nassar removed
Status:	assigned → new

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#35865 new Cleanup/optimization

Queryset aggregation keeps unnecessary SQL joins

Description ¶

Change History (5)

comment:1 by Simon Charette, 5 months ago

comment:2 by Simon Charette, 5 months ago

comment:3 by Stephen, 3 months ago

comment:4 by Ahmed Nassar, 3 weeks ago

comment:5 by Ahmed Nassar, 38 hours ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us