Opened 10 years ago
Closed 4 years ago
#26318 closed Bug (duplicate)
Unexpected / duplicated queries on nested Prefetch queryset with repeated model
| Reported by: | Anthony Leontiev | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | queryset prefetch_related duplicate |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
We discovered an issue in django.db.models.query.Prefetch logic that results in duplicate queries made when the leaves of certain prefetch trees are accessed.
With these models:
from django.db import models
class Publisher(models.Model):
name = models.CharField(max_length=128)
class Author(models.Model):
name = models.CharField(max_length=128)
publishers = models.ManyToManyField('Publisher', related_name='authors')
class Book(models.Model):
name = models.CharField(max_length=128)
publisher = models.ForeignKey('Publisher', related_name='books')
The following test fails:
from django.db.models.query import Prefetch
from django.test import TestCase
from .models import Author, Book, Publisher
def flatten(ls):
return list([i for s in ls for i in s])
class PrefetchTestCase(TestCase):
def setUp(self):
publisher = Publisher.objects.create(name='Publisher0')
Book.objects.create(name='Book0', publisher=publisher)
author = Author.objects.create(name='Author0')
author.publishers.add(publisher)
def test_prefetch_nested(self):
publishers = Publisher.objects.prefetch_related(
Prefetch(
'books',
Book.objects.all().prefetch_related(
Prefetch(
'publisher',
Publisher.objects.all().prefetch_related('authors')
)
)
)
)
with self.assertNumQueries(4):
publishers = list(publishers)
with self.assertNumQueries(0):
books = flatten([p.books.all() for p in publishers])
with self.assertNumQueries(0):
publishers = [b.publisher for b in books]
with self.assertNumQueries(0):
authors = flatten([p.authors.all() for p in publishers])
For more details (comments, queries executed) and an analogous green test-case that uses the flat prefetch form to prefetch the same tree, see the attached test package.
To run the tests:
tar -zxvf prefetch-bug-test.tar.gz cd prefetch-bug-test make test
This issue seemed very similar to https://code.djangoproject.com/ticket/25546, but unfortunately the patch for that ticket (https://code.djangoproject.com/changeset/bdbe50a491ca41e7d4ebace47bfe8abe50a58211) did not fix this problem.
Tested in Django 1.8, 1.9, and master.
Attachments (2)
Change History (6)
by , 10 years ago
| Attachment: | prefetch-bug-test.tar.gz added |
|---|
comment:1 by , 10 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:2 by , 10 years ago
I just ran into this as well, but without recursion. i.e. (with apologies for the implied structure)
users = User.objects.prefetch_related(
Prefetch(
'person',
queryset=Person.objects.prefetch_related(
'books',
),
),
)
would generate 4 queries instead of three (because the books are duplicated).
To be clear, this is a trimmed down version of the actual query I'm trying to construct. I haven't tested it (and I don't know whether I'll be able to any time soon.) Someone should try tweaking the test case to not use recursion, is all I'm saying.
comment:3 by , 8 years ago
I've run into it as well. Consider model M1, having many-to-many relationship to M3 via M2. And M3, having one-to-many relationship to M4:
M1 <- M2 -> M3 <- M4
And the following statement, which is supposed to fetch all models:
m1s = M1.objects.all() \ .prefetch_related( Prefetch('m2s', queryset=M2.objects.prefetch_related('m3__m4s')) )
This statement results in two calls to prefetch_related_objects. One for M1's, and the nested one for M2's. The nested one does one prefetch lookup m3__m4s. The outer one does two. One is m2s, and the second one is m2s__m3__m4s. Since inner lookups get passed to the outer prefetch_related_objects calls.
Now then, reverse many-to-one descriptors (which are used to access M4's from M3's) don't have get_prefetch_queryset method. So models are cached in the instances.
The thing is in our case cache_name is m4, but get_prefetcher (or basically prefetch_related_objects) expects to see m4s there. As a result, both outer and inner prefetch_related_objects calls retrieve M4's from database.
It can be overcome by moving nested lookup to the upper level:
m1s = M1.objects.all().prefetch_related('m2s', 'm2s__m3__m4s')
Not sure if that is possible in each and every case.
I've attached nested_prefetch_create_app.sh script, that creates an app in nested_prefetch dir, that simplifies reproducing the case. After running the script:
$ cd nested_prefetch/nested_prefetch $ . ../bin/activate $ ./manage.py migrate a1 zero && ./manage.py migrate a1 && python 1.py
by , 8 years ago
| Attachment: | nested_prefetch_create_app.sh added |
|---|
creates an app in nested_prefetch dir to easily reproduce the issue
comment:4 by , 4 years ago
| Resolution: | → duplicate |
|---|---|
| Status: | new → closed |
Fixed by f5233dce309543c826224be9dfa9c9f4f855f73c. Duplicate of #32511.
Managed to reproduce against master.
I suppose the algorithm gets confused because the
Publishermodel has to be prefetched twice.