Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#21760 closed Bug (fixed)

prefetch_related uses an inefficient query to get the related fk data

Reported by: valtron2000@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Release blocker Keywords: prefetch_related ForeignKey
Cc: loic@…, marti@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following models and query:

class Author(models.Model):
        name = models.CharField(max_length = 20)

class Book(models.Model):
        name = models.CharField(max_length = 20)
        author = models.ForeignKey(Author, related_name = 'books')

Book.objects.all().prefetch_related('author')

The query I'd expect to be issued by the prefetch_related is:

SELECT * FROM author
WHERE id IN (... the author_ids of the books from the main query ...)

However, the query that actually get executed are:

SELECT a.* FROM author a INNER JOIN book b ON (a.id = b.author_id)
WHERE b.id IN (...  the ids of the books from the main query ...)

There are two problems with this approach:

  1. It does a join, which wouldn't be necessary if author_ids were used instead of book.ids
  2. When one uses prefetch_related in such a case (that is, to prefetch a single fk per object rather than a many to many) instead of select_related, it's usually because there are many primary objects (books) and few unique related objects (authors); select_related, despite getting all the data in one query, could end up returning much more data than is needed because the related objects are duplicated many times over, and will be slower overall. Thus, the approach currently used by prefetch_related ends up using the set of ids that will almost always be larger (the book ids, rather than the authors).

The reverse case (Author.objects.all().prefetch_related('books')) does the efficient, joinless query I'd expect -- namely:

SELECT * FROM book
WHERE author_id IN (... the ids of the authors from the main query ...)

I've attached a sample project with a test.

Attachments (3)

ex.zip (2.8 KB) - added by valtron2000@… 10 years ago.
Project for book example with a test.
patch_21760.diff (2.6 KB) - added by valtron2000@… 10 years ago.
patch_21760_v2.diff (2.4 KB) - added by valtron2000@… 10 years ago.
Fixed patch

Download all attachments as: .zip

Change History (19)

Changed 10 years ago by valtron2000@…

Attachment: ex.zip added

Project for book example with a test.

comment:1 Changed 10 years ago by valtron2000@…

Summary: prefetch_related uses an inefficient query to get the related dataprefetch_related uses an inefficient query to get the related fk data

comment:2 Changed 10 years ago by valtron2000@…

Keywords: ForeignKey added

comment:3 Changed 10 years ago by loic84

Cc: loic@… added

comment:4 Changed 10 years ago by Anssi Kääriäinen

Triage Stage: UnreviewedAccepted

Seems OK to me.

Changed 10 years ago by valtron2000@…

Attachment: patch_21760.diff added

comment:5 Changed 10 years ago by valtron2000@…

Has patch: set

Added a patch & test; seems to do the trick and all tests pass.

comment:6 Changed 10 years ago by valtron2000@…

My bad, I only ran the prefetch_related tests. Patch breaks foreign_object.tests.MultiColumnFKTests.test_prefetch_foreignkey_forward_works.

Changed 10 years ago by valtron2000@…

Attachment: patch_21760_v2.diff added

Fixed patch

comment:7 Changed 10 years ago by Marti Raudsepp

Cc: marti@… added

comment:8 Changed 10 years ago by Marti Raudsepp

I am in the process of migrating an app from Django 1.5 to 1.6; this regression also affects us (it did not occur in Django 1.5). In one example this turns a query from 175 bytes into 66 kB, with a corresponding increase in query time, result set size and memory usage.

I can confirm that the attached patch fixes this issue when applied on top of Django 1.6.2. Can it be included in 1.6.3?

Last edited 10 years ago by Marti Raudsepp (previous) (diff)

comment:9 Changed 10 years ago by Tim Graham

Severity: NormalRelease blocker

Haven't verified the claim that it's a regression from 1.5, but marking it as a release blocker assuming that's the case. A note in the 1.6.3 release notes would also be helpful.

comment:10 Changed 10 years ago by loic84

Related to #21410.

comment:11 Changed 10 years ago by loic84

We discussed this issue with @akaariai on IRC.

Like we did for #21410, we'll take a practical approach for fixing the regression from Django 1.6 to master and go for the proposed fix. It's worth noting that the inefficient query will still be present for multicolumn relations, but these are not documented and one needs to use private APIs to make them, so that doesn't qualify as a regression.

The implementation of ReverseSingleRelatedObjectDescriptor.get_prefetch_queryset() really ought to be cleaned up at some point, especially when we land the patch for composite fields but it'll require a fair amount of refactoring so it only qualifies for master.

@valtron2000, would you mind updating the patch?

  • The "FIXME" comment would have to be updated to explain this new case with a reference to this ticket.
  • The current test is pretty fragile, could you rather check in QS.query if there is a join? (e.g. https://gist.github.com/10345520)
  • Code needs to be compatible with PY3, use range instead of xrange, or import it from six if you really want xrange.
  • The changes inside the if, do they do anything differently or is it just a cleanup? If it's just a cleanup, only fix the if condition, and not the implementation. Because right now it's pretty hard to see where the problem was from.

Feel free to open a PR on github.

Edit: Added a couple more things to the patch review.

Version 1, edited 10 years ago by loic84 (previous) (next) (diff)

comment:12 Changed 10 years ago by valtron2000@…

I'll submit a PR soon; thanks for the suggestions.

The fix is a one-liner in the if, and the rest of the changes were just cleanup.

comment:14 Changed 10 years ago by Loic Bistuer <loic.bistuer@…>

Resolution: fixed
Status: newclosed

In d3b71b976dee4b02908235b8007db254a9795268:

Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

comment:15 Changed 10 years ago by Loic Bistuer <loic.bistuer@…>

In 6b3a8d27052bb5b242ab6c1c2fb0d6da5b49681d:

[1.7.x] Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Backport of d3b71b976d from master

comment:16 Changed 10 years ago by Loic Bistuer <loic.bistuer@…>

In 1252b77824639fba398cf70586bdd75c42f86fdf:

[1.6.x] Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Conflicts:

tests/prefetch_related/tests.py

Backport of d3b71b976d from master

Note: See TracTickets for help on using tickets.
Back to Top