Opened 3 years ago

Last modified 7 months ago

#26318 new Bug

Unexpected / duplicated queries on nested Prefetch queryset with repeated model

Reported by: Anthony Leontiev Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: queryset prefetch_related duplicate
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We discovered an issue in django.db.models.query.Prefetch logic that results in duplicate queries made when the leaves of certain prefetch trees are accessed.

With these models:

from django.db import models


class Publisher(models.Model):
    name = models.CharField(max_length=128)


class Author(models.Model):
    name = models.CharField(max_length=128)
    publishers = models.ManyToManyField('Publisher', related_name='authors')


class Book(models.Model):
    name = models.CharField(max_length=128)
    publisher = models.ForeignKey('Publisher', related_name='books')

The following test fails:

from django.db.models.query import Prefetch
from django.test import TestCase
from .models import Author, Book, Publisher


def flatten(ls):
    return list([i for s in ls for i in s])


class PrefetchTestCase(TestCase):

    def setUp(self):
        publisher = Publisher.objects.create(name='Publisher0')
        Book.objects.create(name='Book0', publisher=publisher)
        author = Author.objects.create(name='Author0')
        author.publishers.add(publisher)

    def test_prefetch_nested(self):
        publishers = Publisher.objects.prefetch_related(
            Prefetch(
                'books',
                Book.objects.all().prefetch_related(
                    Prefetch(
                        'publisher',
                        Publisher.objects.all().prefetch_related('authors')
                    )
                )
            )
        )
        with self.assertNumQueries(4):
            publishers = list(publishers)

        with self.assertNumQueries(0):
            books = flatten([p.books.all() for p in publishers])
        with self.assertNumQueries(0):
            publishers = [b.publisher for b in books]
        with self.assertNumQueries(0):
            authors = flatten([p.authors.all() for p in publishers])

For more details (comments, queries executed) and an analogous green test-case that uses the flat prefetch form to prefetch the same tree, see the attached test package.

To run the tests:

	tar -zxvf prefetch-bug-test.tar.gz
	cd prefetch-bug-test
	make test

This issue seemed very similar to https://code.djangoproject.com/ticket/25546, but unfortunately the patch for that ticket (https://code.djangoproject.com/changeset/bdbe50a491ca41e7d4ebace47bfe8abe50a58211) did not fix this problem.

Tested in Django 1.8, 1.9, and master.

Attachments (2)

prefetch-bug-test.tar.gz (3.9 KB) - added by Anthony Leontiev 3 years ago.
nested_prefetch_create_app.sh (2.9 KB) - added by Yuri Kanivetsky 7 months ago.
creates an app in nested_prefetch dir to easily reproduce the issue

Download all attachments as: .zip

Change History (5)

Changed 3 years ago by Anthony Leontiev

Attachment: prefetch-bug-test.tar.gz added

comment:1 Changed 3 years ago by Simon Charette

Triage Stage: UnreviewedAccepted

Managed to reproduce against master.

I suppose the algorithm gets confused because the Publisher model has to be prefetched twice.

comment:2 Changed 3 years ago by Stephen Burrows

I just ran into this as well, but without recursion. i.e. (with apologies for the implied structure)

users = User.objects.prefetch_related(
    Prefetch(
        'person',
        queryset=Person.objects.prefetch_related(
            'books',
        ),
    ),
)

would generate 4 queries instead of three (because the books are duplicated).

To be clear, this is a trimmed down version of the actual query I'm trying to construct. I haven't tested it (and I don't know whether I'll be able to any time soon.) Someone should try tweaking the test case to not use recursion, is all I'm saying.

comment:3 Changed 7 months ago by Yuri Kanivetsky

I've run into it as well. Consider model M1, having many-to-many relationship to M3 via M2. And M3, having one-to-many relationship to M4:

M1 <- M2 -> M3 <- M4

And the following statement, which is supposed to fetch all models:

m1s = M1.objects.all() \
    .prefetch_related(
        Prefetch('m2s', queryset=M2.objects.prefetch_related('m3__m4s'))
    )

This statement results in two calls to prefetch_related_objects. One for M1's, and the nested one for M2's. The nested one does one prefetch lookup m3__m4s. The outer one does two. One is m2s, and the second one is m2s__m3__m4s. Since inner lookups get passed to the outer prefetch_related_objects calls.

Now then, reverse many-to-one descriptors (which are used to access M4's from M3's) don't have get_prefetch_queryset method. So models are cached in the instances.

The thing is in our case cache_name is m4, but get_prefetcher (or basically prefetch_related_objects) expects to see m4s there. As a result, both outer and inner prefetch_related_objects calls retrieve M4's from database.

It can be overcome by moving nested lookup to the upper level:

m1s = M1.objects.all().prefetch_related('m2s', 'm2s__m3__m4s')

Not sure if that is possible in each and every case.

I've attached nested_prefetch_create_app.sh script, that creates an app in nested_prefetch dir, that simplifies reproducing the case. After running the script:

$ cd nested_prefetch/nested_prefetch
$ . ../bin/activate
$ ./manage.py migrate a1 zero && ./manage.py migrate a1 && python 1.py
Last edited 7 months ago by Yuri Kanivetsky (previous) (diff)

Changed 7 months ago by Yuri Kanivetsky

creates an app in nested_prefetch dir to easily reproduce the issue

Note: See TracTickets for help on using tickets.
Back to Top