Opened 5 years ago

Last modified 3 months ago

#29725 assigned Cleanup/optimization

Inefficient SQL generated when counting a ManyToMany

Reported by: Gavin Wahl Owned by: Shiwei Chen
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When calling count() on an unfiltered many to many relation, a useless join is included in the SQL that makes it much slower than it should be. On my dataset, the difference is 1000ms to 100ms, because an index-only scan can be used.

This is the SQL that is currently generated:

SELECT COUNT(*) AS "__count"
FROM "app_foo"
INNER JOIN "app_foo_bar" ON ("app_foo"."id" = "app_foo_bar"."foo_id")
WHERE "app_foo_bar"."foo_id" = ?;

This is the SQL that should be generated:

SELECT COUNT(*) AS "__count"
FROM "app_foo_bar"
WHERE "app_foo_bar"."foo_id" = ?;

This optimization can only be applied when there are no filters applied, because then the join is used to satisfy the filters. In the no-filters case, only the through table needs to be consulted.

Change History (22)

comment:1 Changed 5 years ago by oliver

Owner: changed from nobody to oliver
Status: newassigned

comment:2 Changed 5 years ago by Simon Charette

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: 2.1master

Hello Olivier,

I think you should be able to achieve this by defining a count() method on the dynamic class created by create_forward_many_to_many_manager by filtering self.through._default_manager based on self.instance and return its count().

comment:3 Changed 5 years ago by Simon Charette

Cc: Simon Charette added

comment:4 in reply to:  2 Changed 5 years ago by oliver

Thanks for your advice.
I will try to correct it according to your advice

Replying to Simon Charette:

Hello Olivier,

I think you should be able to achieve this by defining a count() method on the dynamic class created by create_forward_many_to_many_manager by filtering self.through._default_manager based on self.instance and return its count().

Last edited 5 years ago by oliver (previous) (diff)

comment:6 Changed 5 years ago by oliver

Has patch: set
Needs tests: set

comment:7 Changed 5 years ago by Tom Forbes

Have we considered making this change for exists() as well? I'm sure this will generate the join in the same way that count() does.

Also somewhat related: #28477

Last edited 5 years ago by Tom Forbes (previous) (diff)

comment:8 Changed 5 years ago by Simon Charette

Doing it for exists() as well makes sense.

comment:9 Changed 5 years ago by Simon Charette

Needs tests: unset
Patch needs improvement: set

comment:10 Changed 5 years ago by oliver

Patch needs improvement: unset

comment:11 Changed 5 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 1299421:

Fixed #29725 -- Removed unnecessary join in QuerySet.count() and exists() on a many-to-many relation.

comment:12 Changed 4 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In 5f7991c4:

Fixed #30325 -- Reverted "Fixed #29725 -- Removed unnecessary join in QuerySet.count() and exists() on a many-to-many relation."

This reverts commit 1299421cadc4fcf63585f2f88337078e43e660e0 due to
a regression with custom managers.

comment:13 Changed 4 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In e8de1cc9:

[2.2.x] Fixed #30325 -- Reverted "Fixed #29725 -- Removed unnecessary join in QuerySet.count() and exists() on a many-to-many relation."

This reverts commit 1299421cadc4fcf63585f2f88337078e43e660e0 due to
a regression with custom managers.

Backport of 5f7991c42cff73b6278106d499d719b726f85ead from master

comment:14 Changed 4 years ago by Mariusz Felisiak

Resolution: fixed
Status: closednew

Solution has been reverted because it caused the inconsistent behavior of count() and exists() on a reverse many-to-many relationship with a custom manager (see new tests in 9ac8520fcde29840a1345be19d80dbda53aa6d03).

comment:15 Changed 4 years ago by Mariusz Felisiak

Has patch: unset

comment:16 Changed 4 years ago by Simon Charette

Has patch: set
Patch needs improvement: set

Switched back to patch needs improvement instead as it can probably be adapted to skip the optimization when a custom manager is defined.

comment:17 Changed 4 years ago by Mariusz Felisiak

Ticket #30325 revealed one more issue, i.e. optimization doesn't work when chaining all() after reverse M2M relations.

comment:18 Changed 4 years ago by Mike Hansen

Additionally, the optimization causes a query to be done if the related objects have already been prefetched.

comment:19 Changed 5 months ago by Shiwei Chen

Owner: changed from oliver to Shiwei Chen
Status: newassigned

Thanks everyone for your insights. Im going to try to implement this optimization, accounting for all the issues listed here so far.

PR

Last edited 5 months ago by Mariusz Felisiak (previous) (diff)

comment:20 Changed 5 months ago by Simon Charette

Needs tests: set

Patch needs a few tweaks to properly disable the custom manager (most notably tests). It also lacks coverage for the prefetched case as reported by comment:18.

Version 0, edited 5 months ago by Simon Charette (next)

comment:21 Changed 4 months ago by Shiwei Chen

Needs tests: unset
Patch needs improvement: unset

Updated the patch to properly deal with cases involving custom managers and prefetch queries, and implemented test cases for them.

https://github.com/django/django/pull/16863

comment:22 Changed 3 months ago by Mariusz Felisiak

Needs tests: set
Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top