Opened 3 years ago

Last modified 3 years ago

#21241 new Cleanup/optimization

Optimize the query generated for admin changelist filters

Reported by: Chris Adams Owned by: Simon Charette
Component: contrib.admin Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The admin changelist search implementation currently calls queryset.filter() for each whitespace-separated term. This can cause the ORM to generate repeated JOINs of the same table which is both a potential performance problem and will actually cause MySQL to return an error once you hit a system limit (~60 on the example I encountered).

The patch at https://github.com/django/django/pull/881 changes the logic to combine the terms into a single filter() call and adds some tests.

Attachments (1)

881.patch (5.5 KB) - added by Chris Adams 3 years ago.

Download all attachments as: .zip

Change History (12)

Changed 3 years ago by Chris Adams

Attachment: 881.patch added

comment:1 Changed 3 years ago by Simon Charette

Owner: changed from nobody to Simon Charette
Status: newassigned
Triage Stage: UnreviewedAccepted

comment:2 Changed 3 years ago by Simon Charette

Triage Stage: AcceptedReady for checkin

comment:3 Changed 3 years ago by Simon Charette <charette.s@…>

Resolution: fixed
Status: assignedclosed

In 698dd82eee1cb83f51d4cd39546461bf76976b5e:

Fixed #21241 -- Avoid extraneous JOINs in admin changelist search.

comment:4 Changed 3 years ago by Anssi Kääriäinen

I am pretty sure the patch is bogus. For multivalued relation doing .filter(cond1).filter(cond2) isn't equivalent to .filter(cond1, cond2). The first one creates multiple joins on purpose. The reason for that is that if you have structure like:

idparent_idintval1intval2
1null10
212025
313035

then doing Tree.objects.filter(child__intval1=20).filter(child__intval2=35) will result in a match - there is a children with intval1=20 and a children with intval2=35 (it could be the same children, in this case it isn't). But if you do Tree.objects.filter(child__intval1=20, child__intval2=35) then the interpretation is that there must be *single* children with intval1=20 and intval2=35 and that isn't true. (However, Tree.objects.filter(child__intval1=20, child__intval2=25) would match as there is single children with both values).

On SQL level this single instance versus multiple instances is implemented as single join to childrens versus multiple joins to childrens. So, it is intentional that multiple .filter() calls chained together generates multiple joins, while pushing them all into single .filter() call generates just one join.

This behaviour of the ORM API isn't easy to understand, and I guess most users do not actually know of this... But still, that is how it works, and changing how admin interprets the given filters in this case is a backwards incompatible change.

The ORM API is documented in https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships.

comment:5 Changed 3 years ago by Simon Charette

Resolution: fixed
Status: closednew
Triage Stage: Ready for checkinAccepted

Since this is backward incompatible and both use cases are valid I'll just revert this commit for now.

Maybe we could use the proposed code path when there's no multi-valued relationships?

Might be worth writing a regression test case to make sure we're not breaking the admin changelist behavior in the future.

comment:6 Changed 3 years ago by Simon Charette <charette.s@…>

Resolution: fixed
Status: newclosed

In a8df8e34f9fc0219de73eabcb215d976e75a89ae:

Revert "Fixed #21241 -- Avoid extraneous JOINs in admin changelist search."

This reverts commit 698dd82eee1cb83f51d4cd39546461bf76976b5e.

The patch introduced a backward incompatible change.

comment:7 Changed 3 years ago by Simon Charette

Resolution: fixed
Status: closednew

comment:8 Changed 3 years ago by Anssi Kääriäinen

For single valued relationships doing single .filter() call and multiple chained .filter() calls should both result in a single join (if it doesn't actually to that then it is a bug in the ORM). There might be a performance change due to less cloning for single .filter() call, but that shouldn't be anything dramatic in common cases.

Tests for current behaviour would of course be good. If there is some easy way to allow users to choose just one filter() call that would be good, too.

comment:9 Changed 3 years ago by Simon Charette

@acdha can you confirm one of your search_filters reference a multi valued relationship?

comment:10 Changed 3 years ago by Chris Adams

I would agree about pulling in one of the two test cases as there is currently nothing which tests how multiple query terms are processed.

My production failure was indeed on an M2M field in search_filters, where a user pasted in a sentence of text and got a 500 error because MySQL limited JOINs to 61 tables. On second read, I see why the original behaviour is technically correct if unfortunate.

comment:11 Changed 3 years ago by Tim Graham

Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top