Opened 19 months ago

Last modified 17 months ago

#26318 new Bug

Unexpected / duplicated queries on nested Prefetch queryset with repeated model

Reported by: Anthony Leontiev Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: queryset prefetch_related duplicate
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We discovered an issue in django.db.models.query.Prefetch logic that results in duplicate queries made when the leaves of certain prefetch trees are accessed.

With these models:

from django.db import models


class Publisher(models.Model):
    name = models.CharField(max_length=128)


class Author(models.Model):
    name = models.CharField(max_length=128)
    publishers = models.ManyToManyField('Publisher', related_name='authors')


class Book(models.Model):
    name = models.CharField(max_length=128)
    publisher = models.ForeignKey('Publisher', related_name='books')

The following test fails:

from django.db.models.query import Prefetch
from django.test import TestCase
from .models import Author, Book, Publisher


def flatten(ls):
    return list([i for s in ls for i in s])


class PrefetchTestCase(TestCase):

    def setUp(self):
        publisher = Publisher.objects.create(name='Publisher0')
        Book.objects.create(name='Book0', publisher=publisher)
        author = Author.objects.create(name='Author0')
        author.publishers.add(publisher)

    def test_prefetch_nested(self):
        publishers = Publisher.objects.prefetch_related(
            Prefetch(
                'books',
                Book.objects.all().prefetch_related(
                    Prefetch(
                        'publisher',
                        Publisher.objects.all().prefetch_related('authors')
                    )
                )
            )
        )
        with self.assertNumQueries(4):
            publishers = list(publishers)

        with self.assertNumQueries(0):
            books = flatten([p.books.all() for p in publishers])
        with self.assertNumQueries(0):
            publishers = [b.publisher for b in books]
        with self.assertNumQueries(0):
            authors = flatten([p.authors.all() for p in publishers])

For more details (comments, queries executed) and an analogous green test-case that uses the flat prefetch form to prefetch the same tree, see the attached test package.

To run the tests:

	tar -zxvf prefetch-bug-test.tar.gz
	cd prefetch-bug-test
	make test

This issue seemed very similar to https://code.djangoproject.com/ticket/25546, but unfortunately the patch for that ticket (https://code.djangoproject.com/changeset/bdbe50a491ca41e7d4ebace47bfe8abe50a58211) did not fix this problem.

Tested in Django 1.8, 1.9, and master.

Attachments (1)

prefetch-bug-test.tar.gz (3.9 KB) - added by Anthony Leontiev 19 months ago.

Download all attachments as: .zip

Change History (3)

Changed 19 months ago by Anthony Leontiev

Attachment: prefetch-bug-test.tar.gz added

comment:1 Changed 19 months ago by Simon Charette

Triage Stage: UnreviewedAccepted

Managed to reproduce against master.

I suppose the algorithm gets confused because the Publisher model has to be prefetched twice.

comment:2 Changed 17 months ago by Stephen Burrows

I just ran into this as well, but without recursion. i.e. (with apologies for the implied structure)

users = User.objects.prefetch_related(
    Prefetch(
        'person',
        queryset=Person.objects.prefetch_related(
            'books',
        ),
    ),
)

would generate 4 queries instead of three (because the books are duplicated).

To be clear, this is a trimmed down version of the actual query I'm trying to construct. I haven't tested it (and I don't know whether I'll be able to any time soon.) Someone should try tweaking the test case to not use recursion, is all I'm saying.

Note: See TracTickets for help on using tickets.
Back to Top