Opened 17 months ago

Last modified 17 months ago

#21241 new Cleanup/optimization

Optimize the query generated for admin changelist filters

Reported by: acdha Owned by: charettes
Component: contrib.admin Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The admin changelist search implementation currently calls queryset.filter() for each whitespace-separated term. This can cause the ORM to generate repeated JOINs of the same table which is both a potential performance problem and will actually cause MySQL to return an error once you hit a system limit (~60 on the example I encountered).

The patch at https://github.com/django/django/pull/881 changes the logic to combine the terms into a single filter() call and adds some tests.

Attachments (1)

881.patch (5.5 KB) - added by acdha 17 months ago.

Download all attachments as: .zip

Change History (12)

Changed 17 months ago by acdha

comment:1 Changed 17 months ago by charettes

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to charettes
  • Patch needs improvement unset
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 17 months ago by charettes

  • Triage Stage changed from Accepted to Ready for checkin

comment:3 Changed 17 months ago by Simon Charette <charette.s@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 698dd82eee1cb83f51d4cd39546461bf76976b5e:

Fixed #21241 -- Avoid extraneous JOINs in admin changelist search.

comment:4 Changed 17 months ago by akaariai

I am pretty sure the patch is bogus. For multivalued relation doing .filter(cond1).filter(cond2) isn't equivalent to .filter(cond1, cond2). The first one creates multiple joins on purpose. The reason for that is that if you have structure like:

idparent_idintval1intval2
1null10
212025
313035

then doing Tree.objects.filter(child__intval1=20).filter(child__intval2=35) will result in a match - there is a children with intval1=20 and a children with intval2=35 (it could be the same children, in this case it isn't). But if you do Tree.objects.filter(child__intval1=20, child__intval2=35) then the interpretation is that there must be *single* children with intval1=20 and intval2=35 and that isn't true. (However, Tree.objects.filter(child__intval1=20, child__intval2=25) would match as there is single children with both values).

On SQL level this single instance versus multiple instances is implemented as single join to childrens versus multiple joins to childrens. So, it is intentional that multiple .filter() calls chained together generates multiple joins, while pushing them all into single .filter() call generates just one join.

This behaviour of the ORM API isn't easy to understand, and I guess most users do not actually know of this... But still, that is how it works, and changing how admin interprets the given filters in this case is a backwards incompatible change.

The ORM API is documented in https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships.

comment:5 Changed 17 months ago by charettes

  • Resolution fixed deleted
  • Status changed from closed to new
  • Triage Stage changed from Ready for checkin to Accepted

Since this is backward incompatible and both use cases are valid I'll just revert this commit for now.

Maybe we could use the proposed code path when there's no multi-valued relationships?

Might be worth writing a regression test case to make sure we're not breaking the admin changelist behavior in the future.

comment:6 Changed 17 months ago by Simon Charette <charette.s@…>

  • Resolution set to fixed
  • Status changed from new to closed

In a8df8e34f9fc0219de73eabcb215d976e75a89ae:

Revert "Fixed #21241 -- Avoid extraneous JOINs in admin changelist search."

This reverts commit 698dd82eee1cb83f51d4cd39546461bf76976b5e.

The patch introduced a backward incompatible change.

comment:7 Changed 17 months ago by charettes

  • Resolution fixed deleted
  • Status changed from closed to new

comment:8 Changed 17 months ago by akaariai

For single valued relationships doing single .filter() call and multiple chained .filter() calls should both result in a single join (if it doesn't actually to that then it is a bug in the ORM). There might be a performance change due to less cloning for single .filter() call, but that shouldn't be anything dramatic in common cases.

Tests for current behaviour would of course be good. If there is some easy way to allow users to choose just one filter() call that would be good, too.

comment:9 Changed 17 months ago by charettes

@acdha can you confirm one of your search_filters reference a multi valued relationship?

comment:10 Changed 17 months ago by acdha

I would agree about pulling in one of the two test cases as there is currently nothing which tests how multiple query terms are processed.

My production failure was indeed on an M2M field in search_filters, where a user pasted in a sentence of text and got a 500 error because MySQL limited JOINs to 61 tables. On second read, I see why the original behaviour is technically correct if unfortunate.

comment:11 Changed 17 months ago by timo

  • Patch needs improvement set
Note: See TracTickets for help on using tickets.
Back to Top