Opened 6 years ago

Closed 3 months ago

#26318 closed Bug (duplicate)

Unexpected / duplicated queries on nested Prefetch queryset with repeated model

Reported by: Anthony Leontiev Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: queryset prefetch_related duplicate
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


We discovered an issue in django.db.models.query.Prefetch logic that results in duplicate queries made when the leaves of certain prefetch trees are accessed.

With these models:

from django.db import models

class Publisher(models.Model):
    name = models.CharField(max_length=128)

class Author(models.Model):
    name = models.CharField(max_length=128)
    publishers = models.ManyToManyField('Publisher', related_name='authors')

class Book(models.Model):
    name = models.CharField(max_length=128)
    publisher = models.ForeignKey('Publisher', related_name='books')

The following test fails:

from django.db.models.query import Prefetch
from django.test import TestCase
from .models import Author, Book, Publisher

def flatten(ls):
    return list([i for s in ls for i in s])

class PrefetchTestCase(TestCase):

    def setUp(self):
        publisher = Publisher.objects.create(name='Publisher0')
        Book.objects.create(name='Book0', publisher=publisher)
        author = Author.objects.create(name='Author0')

    def test_prefetch_nested(self):
        publishers = Publisher.objects.prefetch_related(
        with self.assertNumQueries(4):
            publishers = list(publishers)

        with self.assertNumQueries(0):
            books = flatten([p.books.all() for p in publishers])
        with self.assertNumQueries(0):
            publishers = [b.publisher for b in books]
        with self.assertNumQueries(0):
            authors = flatten([p.authors.all() for p in publishers])

For more details (comments, queries executed) and an analogous green test-case that uses the flat prefetch form to prefetch the same tree, see the attached test package.

To run the tests:

	tar -zxvf prefetch-bug-test.tar.gz
	cd prefetch-bug-test
	make test

This issue seemed very similar to, but unfortunately the patch for that ticket ( did not fix this problem.

Tested in Django 1.8, 1.9, and master.

Attachments (2)

prefetch-bug-test.tar.gz (3.9 KB) - added by Anthony Leontiev 6 years ago. (2.9 KB) - added by Yuri Kanivetsky 4 years ago.
creates an app in nested_prefetch dir to easily reproduce the issue

Download all attachments as: .zip

Change History (6)

Changed 6 years ago by Anthony Leontiev

Attachment: prefetch-bug-test.tar.gz added

comment:1 Changed 6 years ago by Simon Charette

Triage Stage: UnreviewedAccepted

Managed to reproduce against master.

I suppose the algorithm gets confused because the Publisher model has to be prefetched twice.

comment:2 Changed 6 years ago by Stephen Burrows

I just ran into this as well, but without recursion. i.e. (with apologies for the implied structure)

users = User.objects.prefetch_related(

would generate 4 queries instead of three (because the books are duplicated).

To be clear, this is a trimmed down version of the actual query I'm trying to construct. I haven't tested it (and I don't know whether I'll be able to any time soon.) Someone should try tweaking the test case to not use recursion, is all I'm saying.

comment:3 Changed 4 years ago by Yuri Kanivetsky

I've run into it as well. Consider model M1, having many-to-many relationship to M3 via M2. And M3, having one-to-many relationship to M4:

M1 <- M2 -> M3 <- M4

And the following statement, which is supposed to fetch all models:

m1s = M1.objects.all() \
        Prefetch('m2s', queryset=M2.objects.prefetch_related('m3__m4s'))

This statement results in two calls to prefetch_related_objects. One for M1's, and the nested one for M2's. The nested one does one prefetch lookup m3__m4s. The outer one does two. One is m2s, and the second one is m2s__m3__m4s. Since inner lookups get passed to the outer prefetch_related_objects calls.

Now then, reverse many-to-one descriptors (which are used to access M4's from M3's) don't have get_prefetch_queryset method. So models are cached in the instances.

The thing is in our case cache_name is m4, but get_prefetcher (or basically prefetch_related_objects) expects to see m4s there. As a result, both outer and inner prefetch_related_objects calls retrieve M4's from database.

It can be overcome by moving nested lookup to the upper level:

m1s = M1.objects.all().prefetch_related('m2s', 'm2s__m3__m4s')

Not sure if that is possible in each and every case.

I've attached script, that creates an app in nested_prefetch dir, that simplifies reproducing the case. After running the script:

$ cd nested_prefetch/nested_prefetch
$ . ../bin/activate
$ ./ migrate a1 zero && ./ migrate a1 && python
Last edited 4 years ago by Yuri Kanivetsky (previous) (diff)

Changed 4 years ago by Yuri Kanivetsky

creates an app in nested_prefetch dir to easily reproduce the issue

comment:4 Changed 3 months ago by Mariusz Felisiak

Resolution: duplicate
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top