Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#17014 closed Bug (fixed)

prefetch_related infinite recursion by default managers

Reported by: Anssi Kääriäinen Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@… Triage Stage: Accepted
Has patch: no Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


I haven't tested these two cases with the trunk version, so I don't know if these mentioned problems work with prefetch_related or not. At least both of these need tests, though.

Recursive default queryset prefetch_related calls need test. The simplest case is:

class PeopleDefManager(models.Manager):
    def get_query_set(self):
        return super(PeopleDefmanager, self).get_query_set().prefetch_related('reverse_best_friend')

class People(models.Model):
    name = models.TextField()
    best_friend = models.ForeignKey('self', related_name='reverse_best_friend')
    objects = PeopleDefManager()

I did some work for #17000, and my version for that ticket had a very serious problem. The problem was indefinite recursion which ended up eating all the memory and going to OOM killer. That is a pretty bad failure condition... I ended up tracking which prefetches had been done and bailing out if recursion was detected. And having a hard limit of doing at most 200 prefetch_one_level calls in one queryset. The limit could be much lower while still allowing for infinite recursion. I haven't tested if the trunk version suffers from this. At least tests needed in any case.

Another condition that needs testing is:

                 / \
       through  /   \    through
       b_set   /     \   c_set
              B       C
               \     /
  through d_set \   /    through d_set
                 \ /

It is important that both B and C have a field with same name to D. I don't know if this works or not, but this also needs tests at least.

Attachments (1)

prefetch_recursion_trunk.diff (4.6 KB) - added by Anssi Kääriäinen 7 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 Changed 7 years ago by Anssi Kääriäinen

Cc: anssi.kaariainen@… added
Needs documentation: set
Summary: prefetch_related tests needed: recursive default prefetch_related structures, diamond structuresprefetch_related infinite recursion by default managers

Ok, I have confirmed this against both #17003 patch and trunk. Recursion will eat all your memory and OOM killer will get into action (after some trashing). The attached test cases show the problem. I think the patch will apply to #17003, too.

I will work on a patch against trunk next. My idea is to track added prefetches by key (model, to_attr) where to_attr will be currently the name of the related manager we are caching the results in. If we are adding a prefetch related lookup into the query and the lookup came from default manager, we will bail out. This way it is possible to traverse the same path intentionally, but default manager recursion will be avoided. That is, we can define that we want People -> friends -> friends, but if we have default prefetch for friends, we will bail out because lookup People -> friends has been added to the query already. I will try to do one against #17003, too.

Changed 7 years ago by Anssi Kääriäinen

comment:2 Changed 7 years ago by Luke Plant

Triage Stage: UnreviewedAccepted

I've got a patch for this ticket which takes essentially the same approach you describe. It works for the tests your provided (though there were a bunch of errors in that diff I had to correct).

I'm basically testing whether we've seen the descriptor object before, since that uniquely defines the relationship. It depends on my patch for #17003, though, so I will not post here.

Although the test cases are very useful, I'm debating not including tests for this in Django itself, since the failure condition is bad and could effectively take down a developers machine or a build bot, and unit tests are a bad way to check for termination problems.

comment:3 Changed 7 years ago by Luke Plant

Resolution: fixed
Status: newclosed

In [16940]:

Fixed #17014 - added protection against infinite recursion.

Thanks to akaariai for the report and tests.

No tests have been added, since unittests for termination are basically
impossible, and the failure condition will take down the developer's machine
in this case. It has been tested against the cases in #17014.

comment:4 Changed 7 years ago by Anssi Kääriäinen

Two things about the patch:

  1. I really think you should add a MAX_PREFETCH_CALLS to db/models/, and check len(done_lookups) against that. This is just for safety.
  2. This could be tested with a default manager like:
class DefManager:
    num_calls = 0
    def get_query_set(self):
        if DefManager.num_calls > 100:
            raise Exception('Recursion detected')
        DefManager.num_calls += 1
        return ...

This should make it safe to test this. Also, I think having a production machine taken down is worse than having a developers machine taken down.

comment:5 Changed 7 years ago by Carl Meyer

I agree with akaariai that there would be ways to test this failure condition without actually exposing the test suite to infinite recursion, and that tests should be added for this commit.

comment:6 Changed 7 years ago by Luke Plant

I agree that having a production machine taken down is bad - what I'm saying is that a unit test for this failure might be virtually useless in stopping that happen, since you would hope that developers would actually run their code before deploying, and it could come with the risk of taking down a developer's machine. There are many things for which automated tests and our testing infrastructure are just not appropriate, and the Halting Problem tells us that is probably one of them. The only way that you can have assurance of termination is by analysis.

However, akaariai's solution looks helpful - at least it will provide test coverage of our recursion detection. I will add a test based on that idea, thanks.

I'm not so convinced about MAX_PREFETCH_CALLS at this point in time. If there is a case that we are not covering with the current solution, we should think about what that is and try to fix it, not paper over it. If there is definitely a class of cases we cannot fix, we will need a fallback, but we should go through the thought process. There is also the problem of setting the value - 100 could be way to high to stop some prefetch_related explosions, depending on the size of the initial query set and the number of related objects. Is there a better heuristic to apply?

comment:7 Changed 7 years ago by Anssi Kääriäinen

Testing doesn't necessarily help the application developer. Typical error condition for this failure is when there is some tree / DAG structure and it gets corrupted for some reason.

One option is to define much smaller MAX_PREFETCH_CALLS (say 10) but make it overriddable by a kwarg to prefetch_related. This way you can get better protection and still you can go crazy with the prefetch if you for some reason need it. It would be cool if you could define .prefetch_related(R('subprojects', recurse=True)) and get all the subprojects of the project efficiently. And for this a max depth would be a good thing to have anyways.

But I don't know if we should worry more about this (I know, I started this...). The best guard is to write all the tests we can think of for this and then trust that our code does the right thing. The failure mode for .delete() seems to be pretty similar, too, so we are not alone here. And WITH RECURSIVE SQL queries are similar, too (although many databases do have maximum depth of recursion as a default setting).

So, I am happy if we have tests for recursion errors but don't do anything else. I am even more happy if we come up with some acceptable heuristic.

Note: See TracTickets for help on using tickets.
Back to Top