Opened 9 years ago
Closed 3 years ago
#26318 closed Bug (duplicate)
Unexpected / duplicated queries on nested Prefetch queryset with repeated model
Reported by: | Anthony Leontiev | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | queryset prefetch_related duplicate |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
We discovered an issue in django.db.models.query.Prefetch
logic that results in duplicate queries made when the leaves of certain prefetch trees are accessed.
With these models:
from django.db import models class Publisher(models.Model): name = models.CharField(max_length=128) class Author(models.Model): name = models.CharField(max_length=128) publishers = models.ManyToManyField('Publisher', related_name='authors') class Book(models.Model): name = models.CharField(max_length=128) publisher = models.ForeignKey('Publisher', related_name='books')
The following test fails:
from django.db.models.query import Prefetch from django.test import TestCase from .models import Author, Book, Publisher def flatten(ls): return list([i for s in ls for i in s]) class PrefetchTestCase(TestCase): def setUp(self): publisher = Publisher.objects.create(name='Publisher0') Book.objects.create(name='Book0', publisher=publisher) author = Author.objects.create(name='Author0') author.publishers.add(publisher) def test_prefetch_nested(self): publishers = Publisher.objects.prefetch_related( Prefetch( 'books', Book.objects.all().prefetch_related( Prefetch( 'publisher', Publisher.objects.all().prefetch_related('authors') ) ) ) ) with self.assertNumQueries(4): publishers = list(publishers) with self.assertNumQueries(0): books = flatten([p.books.all() for p in publishers]) with self.assertNumQueries(0): publishers = [b.publisher for b in books] with self.assertNumQueries(0): authors = flatten([p.authors.all() for p in publishers])
For more details (comments, queries executed) and an analogous green test-case that uses the flat prefetch form to prefetch the same tree, see the attached test package.
To run the tests:
tar -zxvf prefetch-bug-test.tar.gz cd prefetch-bug-test make test
This issue seemed very similar to https://code.djangoproject.com/ticket/25546, but unfortunately the patch for that ticket (https://code.djangoproject.com/changeset/bdbe50a491ca41e7d4ebace47bfe8abe50a58211) did not fix this problem.
Tested in Django 1.8, 1.9, and master.
Attachments (2)
Change History (6)
by , 9 years ago
Attachment: | prefetch-bug-test.tar.gz added |
---|
comment:1 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 9 years ago
I just ran into this as well, but without recursion. i.e. (with apologies for the implied structure)
users = User.objects.prefetch_related( Prefetch( 'person', queryset=Person.objects.prefetch_related( 'books', ), ), )
would generate 4 queries instead of three (because the books are duplicated).
To be clear, this is a trimmed down version of the actual query I'm trying to construct. I haven't tested it (and I don't know whether I'll be able to any time soon.) Someone should try tweaking the test case to not use recursion, is all I'm saying.
comment:3 by , 7 years ago
I've run into it as well. Consider model M1
, having many-to-many relationship to M3
via M2
. And M3
, having one-to-many relationship to M4
:
M1 <- M2 -> M3 <- M4
And the following statement, which is supposed to fetch all models:
m1s = M1.objects.all() \ .prefetch_related( Prefetch('m2s', queryset=M2.objects.prefetch_related('m3__m4s')) )
This statement results in two calls to prefetch_related_objects. One for M1
's, and the nested one for M2
's. The nested one does one prefetch lookup m3__m4s
. The outer one does two. One is m2s
, and the second one is m2s__m3__m4s
. Since inner lookups get passed to the outer prefetch_related_objects
calls.
Now then, reverse many-to-one descriptors (which are used to access M4
's from M3
's) don't have get_prefetch_queryset method. So models are cached in the instances.
The thing is in our case cache_name
is m4
, but get_prefetcher
(or basically prefetch_related_objects
) expects to see m4s there. As a result, both outer and inner prefetch_related_objects
calls retrieve M4
's from database.
It can be overcome by moving nested lookup to the upper level:
m1s = M1.objects.all().prefetch_related('m2s', 'm2s__m3__m4s')
Not sure if that is possible in each and every case.
I've attached nested_prefetch_create_app.sh script, that creates an app in nested_prefetch
dir, that simplifies reproducing the case. After running the script:
$ cd nested_prefetch/nested_prefetch $ . ../bin/activate $ ./manage.py migrate a1 zero && ./manage.py migrate a1 && python 1.py
by , 7 years ago
Attachment: | nested_prefetch_create_app.sh added |
---|
creates an app in nested_prefetch dir to easily reproduce the issue
comment:4 by , 3 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Fixed by f5233dce309543c826224be9dfa9c9f4f855f73c. Duplicate of #32511.
Managed to reproduce against master.
I suppose the algorithm gets confused because the
Publisher
model has to be prefetched twice.