Opened 11 years ago

Closed 5 years ago

#22014 closed Bug (needsinfo)

`prefetch_related` recursion protection does not cover all cases

Reported by: StillNewb Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: prefetch
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This issue related to #17014

Commit (see line 1655):
https://github.com/django/django/commit/64da8eec3077eaaba5cfa62775523da772f4de44#diff-5b0dda5eb9a242c15879dc9cd2121379R1655

Summary:

prefetch_related collects additional prefetch lookups made by querysets that are created during prefetching, and handle them to in addition to lookups defined by user.
Sometimes it can casue infinite recursion, so for preventing that, there needed some mechanism that ensures that prefetches that was already done, would not be performed again.

Problems in the code:

Now that mechanism stores descriptors of relational fields and checks against them every iteration.
Python descriptor is shared against instances of classes. So descriptor represents relational field of model, not field of instance.
For same relation and different sets of instances there are different prefetch queries, so descriptors cant be used as identifiers for facts that lookup was already prefetched.
And I also would add that passing descriptors around is very much unpythonic and not healthy practice in general. Descriptors are advanced properties, they must serve for same purpose as properties - hiding some logic in attribute assigment/deletion/retrieving.

Reason here - is to identify lookups while traversing data relationships and never repeat them.
In code this reason is not well exporessed:

  • There is done_queries and followed_descriptors - two places for same thing.
  • Check is against descriptors, but in done_queries lookup paths is being added.
  • Check against lookup type (auto lookup or normal) is redundant, there is no difference between auto-lookups and normal lookups in matter of which allowed to spam, they must be checked for which was already done uniformly.

Specific problem is in the check condition for adding in done_lookups - not (lookup in auto_lookups and descriptor in followed_descriptors) - intention here was "if this is not autolookup, descriptor for which is already seen, and presumably lookup path for it was added to done_queries - not adding it to done_queries. So autolookup-spam will be stopped".
But what if descriptor for field, throug lookup-spam assumed to flow, was already added while performing different lookup with different data/parameters? If that lookup will be autolookup - it will never be added to done_queries and would be allowed to be performed infinity number of times.

There is too much unneeded complexity in that code.

ps.
Ive made the patch with two tests.

Attachments (1)

prefetch_recursion_protection_fail.diff (4.9 KB ) - added by StillNewb 11 years ago.

Download all attachments as: .zip

Change History (6)

comment:1 by Baptiste Mispelon, 11 years ago

Resolution: needsinfo
Status: newclosed

Hi,

Correct me if I'm wrong, but from what I can tell, the test you provide doesn't demonstrate an infinite loop.

There does appear to be some repeated queries but I'm not familiar enough with the ORM internals to say whether it's actually a bug or a performance issue that we could improve.

I'll mark this ticket as "needs info". Can you reopen it with a testcase that shows the infinite recursion?

Thanks.

comment:2 by Pashkin, 11 years ago

Resolution: needsinfo
Status: closednew

Yes, you right, in the first patch I've demostrated one "step" of infinite recursion, now I wrote third test which really reproduces infinite recursion.
Here is the commit:
https://github.com/AndrewPashkin/django/commit/87d60b4958dc9c63c1dd970930e8c20d03e5eb36

Last edited 11 years ago by Pashkin (previous) (diff)

comment:3 by Tim Graham, 10 years ago

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted

Confirmed the provided tests do fail.

comment:4 by anonymous, 10 years ago

I actually have fixed this bug in this app:
https://bitbucket.org/andrew_pashkin/django-deep-prefetch/

It adds feature for prefetching sequential GFK lookups and fixes this bug. Code from it can be ported to Django.

comment:5 by Baptiste Mispelon, 5 years ago

Resolution: needsinfo
Status: newclosed

Using a modified version of the prefetch_recursion_protection_fail.diff​ patch attached at the top of the ticket, I was able to run the provided test against the latest master and the problem seems to have been fixed.

Using git bisect, I found that the commit that fixed the attached failing test was bdbe50a491ca41e7d4ebace47bfe8abe50a58211.

Sadly, link from comment:2 is now dead so I can't reproduce any infinite recursion error so I'm going to draw the same conclusion as I did in comment:1:

I'm going to close this ticket as needsinfo. Please reopen if you can provide some steps to reproduce the issue (ideally in the form of a unit test against a current version of Django).

Thanks.

For reference, here's the modified testcase I used:

class RecursionProtectionTests(TestCase):
    def setUp(self):
        self.book1 = Book.objects.create()
        self.book2 = Book.objects.create()
        self.author1 = Author.objects.create(first_book=self.book1, name='Author #1')
        self.author2 = Author.objects.create(first_book=self.book1, name='Author #2')
        self.publisher = Publisher.objects.create()
        self.author1.books.add(self.book1)
        self.author2.books.add(self.book2)
        self.book1.publishers.add(self.publisher)
        self.publisher.authors.add(self.author1, self.author2)

    def walk(self, authors):
        fetched_books = []
        for author in authors:
            for book in author.books.all():
                for publisher in book.publishers.all():
                    for published_author in publisher.authors.all():
                        for fetched_book in published_author.books.all():
                            fetched_books.append(fetched_book)
        return fetched_books

    @override_settings(DEBUG=True)
    def test_descriptors_storing_protection1(self):
        qs1 = Author.objects.filter(pk=self.author1.pk).prefetch_related(Prefetch('books', Book.objects.prefetch_related('publishers__authors__books')), 'books__publishers__authors__books')
        qs2 = Author.objects.filter(pk=self.author1.pk).prefetch_related(Prefetch('books', Book.objects.prefetch_related('publishers__authors__books')), 'books__publishers__authors__books', 'books__publishers__authors__books')


        def count_queries(qs):
            offset = len(connection.queries)
            results = self.walk(qs)
            return results, len(connection.queries[offset:])

        results1, queries1 = count_queries(qs1)
        results2, queries2 = count_queries(qs2)

        self.assertEqual(len(results1), 2)
        self.assertEqual(len(results2), 2)

        self.assertEqual(queries1, queries2)


    @override_settings(DEBUG=True)
    def test_only_unique_queries(self):
        qs = Author.objects.filter(pk=self.author1.pk).prefetch_related(Prefetch('books', Book.objects.prefetch_related('publishers__authors__books')), 'books__publishers__authors__books')
        offset = len(connection.queries)
        self.walk(qs)
        queries = [q['sql'] for q in connection.queries[offset:]]
        self.assertEqual(len(queries), len(set(queries)))
Note: See TracTickets for help on using tickets.
Back to Top