Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#21760 closed Bug (fixed)

prefetch_related uses an inefficient query to get the related fk data

Reported by: valtron2000@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Release blocker Keywords: prefetch_related ForeignKey
Cc: loic@…, marti@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following models and query:

class Author(models.Model):
        name = models.CharField(max_length = 20)

class Book(models.Model):
        name = models.CharField(max_length = 20)
        author = models.ForeignKey(Author, related_name = 'books')

Book.objects.all().prefetch_related('author')

The query I'd expect to be issued by the prefetch_related is:

SELECT * FROM author
WHERE id IN (... the author_ids of the books from the main query ...)

However, the query that actually get executed are:

SELECT a.* FROM author a INNER JOIN book b ON (a.id = b.author_id)
WHERE b.id IN (...  the ids of the books from the main query ...)

There are two problems with this approach:

  1. It does a join, which wouldn't be necessary if author_ids were used instead of book.ids
  2. When one uses prefetch_related in such a case (that is, to prefetch a single fk per object rather than a many to many) instead of select_related, it's usually because there are many primary objects (books) and few unique related objects (authors); select_related, despite getting all the data in one query, could end up returning much more data than is needed because the related objects are duplicated many times over, and will be slower overall. Thus, the approach currently used by prefetch_related ends up using the set of ids that will almost always be larger (the book ids, rather than the authors).

The reverse case (Author.objects.all().prefetch_related('books')) does the efficient, joinless query I'd expect -- namely:

SELECT * FROM book
WHERE author_id IN (... the ids of the authors from the main query ...)

I've attached a sample project with a test.

Attachments (3)

ex.zip (2.8 KB ) - added by valtron2000@… 10 years ago.
Project for book example with a test.
patch_21760.diff (2.6 KB ) - added by valtron2000@… 10 years ago.
patch_21760_v2.diff (2.4 KB ) - added by valtron2000@… 10 years ago.
Fixed patch

Download all attachments as: .zip

Change History (19)

by valtron2000@…, 10 years ago

Attachment: ex.zip added

Project for book example with a test.

comment:1 by valtron2000@…, 10 years ago

Summary: prefetch_related uses an inefficient query to get the related dataprefetch_related uses an inefficient query to get the related fk data

comment:2 by valtron2000@…, 10 years ago

Keywords: ForeignKey added

comment:3 by loic84, 10 years ago

Cc: loic@… added

comment:4 by Anssi Kääriäinen, 10 years ago

Triage Stage: UnreviewedAccepted

Seems OK to me.

by valtron2000@…, 10 years ago

Attachment: patch_21760.diff added

comment:5 by valtron2000@…, 10 years ago

Has patch: set

Added a patch & test; seems to do the trick and all tests pass.

comment:6 by valtron2000@…, 10 years ago

My bad, I only ran the prefetch_related tests. Patch breaks foreign_object.tests.MultiColumnFKTests.test_prefetch_foreignkey_forward_works.

by valtron2000@…, 10 years ago

Attachment: patch_21760_v2.diff added

Fixed patch

comment:7 by Marti Raudsepp, 10 years ago

Cc: marti@… added

comment:8 by Marti Raudsepp, 10 years ago

I am in the process of migrating an app from Django 1.5 to 1.6; this regression also affects us (it did not occur in Django 1.5). In one example this turns a query from 175 bytes into 66 kB, with a corresponding increase in query time, result set size and memory usage.

I can confirm that the attached patch fixes this issue when applied on top of Django 1.6.2. Can it be included in 1.6.3?

Last edited 10 years ago by Marti Raudsepp (previous) (diff)

comment:9 by Tim Graham, 10 years ago

Severity: NormalRelease blocker

Haven't verified the claim that it's a regression from 1.5, but marking it as a release blocker assuming that's the case. A note in the 1.6.3 release notes would also be helpful.

comment:10 by loic84, 10 years ago

Related to #21410.

comment:11 by loic84, 10 years ago

We discussed this issue with @akaariai on IRC.

Like we did for #21410, we'll take a practical approach for fixing the regression from Django 1.6 to master and go for the proposed fix. It's worth noting that the inefficient query will still be present for multicolumn relations, but these are not documented and one needs to use private APIs to make them, so that doesn't qualify as a regression.

The implementation of ReverseSingleRelatedObjectDescriptor.get_prefetch_queryset() really ought to be cleaned up at some point, especially when we land the patch for composite fields but it'll require a fair amount of refactoring so it only qualifies for master.

@valtron2000, would you mind updating the patch?

  • The "FIXME" comment would have to be updated to explain this new case with a reference to this ticket.
  • The current test is pretty fragile, could you rather check in QS.query if there is a join? (e.g. https://gist.github.com/10345520)
  • Code needs to be compatible with PY3, use range instead of xrange, or import it from six if you really want xrange.
  • The changes inside the if, do they do anything differently or is it just a cleanup? If it's just a cleanup, only fix the if condition, and not the implementation. Because right now it's pretty hard to see where the problem was from.

Feel free to open a PR on github.

Edit: Added a couple more things to the patch review.

Version 1, edited 10 years ago by loic84 (previous) (next) (diff)

comment:12 by valtron2000@…, 10 years ago

I'll submit a PR soon; thanks for the suggestions.

The fix is a one-liner in the if, and the rest of the changes were just cleanup.

comment:14 by Loic Bistuer <loic.bistuer@…>, 10 years ago

Resolution: fixed
Status: newclosed

In d3b71b976dee4b02908235b8007db254a9795268:

Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

comment:15 by Loic Bistuer <loic.bistuer@…>, 10 years ago

In 6b3a8d27052bb5b242ab6c1c2fb0d6da5b49681d:

[1.7.x] Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Backport of d3b71b976d from master

comment:16 by Loic Bistuer <loic.bistuer@…>, 10 years ago

In 1252b77824639fba398cf70586bdd75c42f86fdf:

[1.6.x] Fixed #21760 -- prefetch_related used an inefficient query for reverse FK.

Regression introduced by commit 9777442. Refs #21410.

Conflicts:

tests/prefetch_related/tests.py

Backport of d3b71b976d from master

Note: See TracTickets for help on using tickets.
Back to Top