Opened 15 months ago

Closed 12 months ago

Last modified 12 months ago

#21760 closed Bug (fixed)

prefetch_related uses an inefficient query to get the related fk data

Reported by: valtron2000@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Release blocker Keywords: prefetch_related ForeignKey
Cc: loic@…, marti@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following models and query:

class Author(models.Model):
        name = models.CharField(max_length = 20)

class Book(models.Model):
        name = models.CharField(max_length = 20)
        author = models.ForeignKey(Author, related_name = 'books')

Book.objects.all().prefetch_related('author')

The query I'd expect to be issued by the prefetch_related is:

SELECT * FROM author
WHERE id IN (... the author_ids of the books from the main query ...)

However, the query that actually get executed are:

SELECT a.* FROM author a INNER JOIN book b ON (a.id = b.author_id)
WHERE b.id IN (...  the ids of the books from the main query ...)

There are two problems with this approach:

  1. It does a join, which wouldn't be necessary if author_ids were used instead of book.ids
  2. When one uses prefetch_related in such a case (that is, to prefetch a single fk per object rather than a many to many) instead of select_related, it's usually because there are many primary objects (books) and few unique related objects (authors); select_related, despite getting all the data in one query, could end up returning much more data than is needed because the related objects are duplicated many times over, and will be slower overall. Thus, the approach currently used by prefetch_related ends up using the set of ids that will almost always be larger (the book ids, rather than the authors).

The reverse case (Author.objects.all().prefetch_related('books')) does the efficient, joinless query I'd expect -- namely:

SELECT * FROM book
WHERE author_id IN (... the ids of the authors from the main query ...)

I've attached a sample project with a test.

Attachments (3)

ex.zip (2.8 KB) - added by valtron2000@… 15 months ago.
Project for book example with a test.
patch_21760.diff (2.6 KB) - added by valtron2000@… 15 months ago.
patch_21760_v2.diff (2.4 KB) - added by valtron2000@… 15 months ago.
Fixed patch

Download all attachments as: .zip

Change History (19)

Changed 15 months ago by valtron2000@…

Project for book example with a test.

comment:1 Changed 15 months ago by valtron2000@…

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from prefetch_related uses an inefficient query to get the related data to prefetch_related uses an inefficient query to get the related fk data

comment:2 Changed 15 months ago by valtron2000@…

  • Keywords ForeignKey added

comment:3 Changed 15 months ago by loic84

  • Cc loic@… added

comment:4 Changed 15 months ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

Seems OK to me.

Changed 15 months ago by valtron2000@…

comment:5 Changed 15 months ago by valtron2000@…

  • Has patch set

Added a patch & test; seems to do the trick and all tests pass.

comment:6 Changed 15 months ago by valtron2000@…

My bad, I only ran the prefetch_related tests. Patch breaks foreign_object.tests.MultiColumnFKTests.test_prefetch_foreignkey_forward_works.

Changed 15 months ago by valtron2000@…

Fixed patch

comment:7 Changed 12 months ago by intgr

  • Cc marti@… added

comment:8 Changed 12 months ago by intgr

I am in the process of migrating an app from Django 1.5 to 1.6; this regression also affects us (it did not occur in Django 1.5). In one example this turns a query from 175 bytes into 66 kB, with a corresponding increase in query time, result set size and memory usage.

I can confirm that the attached patch fixes this issue when applied on top of Django 1.6.2. Can it be included in 1.6.3?

Last edited 12 months ago by intgr (previous) (diff)

comment:9 Changed 12 months ago by timo

  • Severity changed from Normal to Release blocker

Haven't verified the claim that it's a regression from 1.5, but marking it as a release blocker assuming that's the case. A note in the 1.6.3 release notes would also be helpful.

comment:10 Changed 12 months ago by loic84

Related to #21410.

comment:11 Changed 12 months ago by loic84

We discussed this issue with @akaariai on IRC.

Like we did for #21410, we'll take a practical approach for fixing the regression from Django 1.6 to master and go for the proposed fix. It's worth noting that the inefficient query will still be present for multicolumn relations, but these are not documented and one needs to use private APIs to make them, also these have never seen the "joinless" query, so that doesn't qualify as a regression.

The implementation of ReverseSingleRelatedObjectDescriptor.get_prefetch_queryset() really ought to be cleaned up at some point, especially when we land the patch for composite fields but it'll require a fair amount of refactoring so it only qualifies for master.

@valtron2000, would you mind updating the patch?

  • The "FIXME" comment would have to be updated to explain this new case with a reference to this ticket.
  • The current test is pretty fragile, could you rather check in QS.query if there is a join? (e.g. https://gist.github.com/10345520)
  • Code needs to be compatible with PY3, use range instead of xrange, or import it from six if you really want xrange.
  • The changes inside the if, do they do anything differently or is it just a cleanup? If it's just a cleanup, only fix the if condition, and not the implementation. Because right now it's pretty hard to see where the problem was from.

Feel free to open a PR on github.

Edit: Added a couple more things to the patch review.

Last edited 12 months ago by loic84 (previous) (diff)

comment:12 Changed 12 months ago by valtron2000@…

I'll submit a PR soon; thanks for the suggestions.

The fix is a one-liner in the if, and the rest of the changes were just cleanup.

comment:14 Changed 12 months ago by Loic Bistuer <loic.bistuer@…>

  • Resolution set to fixed
  • Status changed from new to closed

In d3b71b976dee4b02908235b8007db254a9795268:

Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

comment:15 Changed 12 months ago by Loic Bistuer <loic.bistuer@…>

In 6b3a8d27052bb5b242ab6c1c2fb0d6da5b49681d:

[1.7.x] Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Backport of d3b71b976d from master

comment:16 Changed 12 months ago by Loic Bistuer <loic.bistuer@…>

In 1252b77824639fba398cf70586bdd75c42f86fdf:

[1.6.x] Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Conflicts:

tests/prefetch_related/tests.py

Backport of d3b71b976d from master

Note: See TracTickets for help on using tickets.
Back to Top