Opened 2 months ago

Last modified 2 months ago

#35865 new Cleanup/optimization

Queryset aggregation keeps unnecessary SQL joins

Reported by: Ruslan Owned by:
Component: Database layer (models, ORM) Version:
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Problem:
Query.count() results sub optimal SQL query. Though, the impact might be minimal and highly depend on SQL engine implementation. But cleaner and less controversial SQL is always better

Observed behavior:
When using .count() method (through QuerySet or Query — doesn't matter) it still keeps all joins, even if they are not required for the query.

Expected behavior:
Calling .count() (Query.get_count()) should setup only necessary joins and ignore any other joins that were added due to custom QuerySet.values or any other method that modifies SELECT.

Some context:
Version: 4.x, 5.x, dev
The code (django/django/db/models/sql/query.py:635):

def get_count(self, using):
    """
    Perform a COUNT() query using the current filter constraints.
    """
    obj = self.clone()
    return obj.get_aggregation(using, {"__count": Count("*")})["__count"]

I tried to call obj.clear_select_clause() but it didn't affect any joins, because they are apparently somewhere in Query.alias_map and I am not yet that familiar with how it actually works. But I will appreciate any hints, even if it is a non-issue for a Django project itself.

Change History (2)

comment:1 by Simon Charette, 2 months ago

Easy pickings: unset
Summary: Query.get_count() keeps unnecessary SQL joinsQueryset aggregation keeps unnecessary SQL joins
Triage Stage: UnreviewedAccepted

To give you a bit of context here the ORM use to not prune unused annotations before Django 4.2 (#28477) and the lack post annotation pruning left-over JOIN pruning was identified as a potential optimization at the time.

To give a concrete example say you do

Book.objects.annotate(
    author_name=Concat("author__first_name", V(" "), "author_last_name"),
).count()

then prior to 59bea9efd2768102fc9d3aedda469502c218e9b7 the generated SQL would have been

SELECT COUNT(*) FROM (
    SELECT book.id, (author.first_name || ' ' || author.last_name) author_name
    FROM book
    LEFT JOIN author ON (book.author_id = author.id)
)

and after it is

SELECT COUNT(*)
FROM book
LEFT JOIN author ON (book.author_id = author.id)

Now obviously in this case the M:1 join against author is not necessary in this case but it's not always trivial to determine. Take the following example

author_qs = Author.objects.annotate(
    book_title=F("books__title")
)
author_qs.count()

which results in

SELECT COUNT(*)
FROM author
LEFT JOIN book ON (book.author_id = author.id)

Then in this case we can't prune the 1:M join as it's multi-valued (possibly many books for each author) and would return a different value from len(author_qs).

The problem then becomes that JOINs can be only be pruned if these two conditions are met

  1. They are not referenced anymore (could be done by decrementing reference counts on annotation pruning)
  2. They are not involved in multi-valued relationships (AKA many-to-many or reverse many-to-one)

I'm tentatively accepting as this is an already identified desired optimization but it is far from being an easy picking, it's in the realm of close to wont-fix very hard to do correctly.

comment:2 by Simon Charette, 2 months ago

Cc: Simon Charette added
Note: See TracTickets for help on using tickets.
Back to Top