Django

Code

Ticket #7128 (closed: fixed)

Opened 7 months ago

Last modified 5 months ago

Infinite Recursion in new QuerySet Refactor when using a post_init signal

Reported by: bo <bo.blanton@gmail.com> Assigned to: mtredinnick
Milestone: 1.0 Component: Core framework
Version: SVN Keywords: qsrf-cleanup QuerySet Refactor Infinite Recursion
Cc: Triage Stage: Design decision needed
Has patch: 0 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

The new Queryset refactor uses deepcopy for the where element of Query.clone()

in django.db.models.sql.Query.py line 169(ish)
        obj.where = deepcopy(self.where)

One gets an object that has a post_init signal (or post_save). (best shown be example)

class MyObj(models.Model):
	cola = models.Char...
	colb = models.Char...
	..
	attr_a = None # none DB attribute
	attr_b = None # none DB attribute

class AnotherObj(models.Model):
	my_obj = models.ForeignKey(MyObj)
	cola = models.Char...
	colb = models.Char...

class YetAnotherObj(models.Model):
	my_obj = models.ForeignKey(MyObj)
	cola = models.Char...
	colb = models.Char...
	

def after_my_obj_init(sender, instance, signal, *args, **kwargs):
	# try to fill in the instances 'other attributes' with some preset
	# querySets for use as properties later
	
	#the deepcopy in the of the Querset clone
	# will cause an intinite HERE (clone on the order_by)
	instance.attr_a = AnotherObj.objects.filter(
		my_obj = instance, 
		cola__contains = "Some Text"
	).order_by('cola')
	
	#No Infinite Recusrion as there is no Clone() called
	instance.attr_b = AnotherObj.objects.filter(
		my_obj = instance, 
		cola__contains = "Some Other Text"
	)
	
dispatcher.connect(after_my_obj_init, signal=signals.post_init, sender=MyObj)

The above signal will now result in an infinite recursion .. as apparently it keeps trying to call the post_init function on 'instance' as it marches done the Where Tree.

There is a solution, but becuase it can also happen Outside the initsignal on other QuerySets?, and seems rather sporatic sometimes for instance anytime clone is used it can crash.

an_obj = MyObj.objects.get(pk = 1)

# No infinite recursion
my_stuf = YetAnotherObj.objects.filter(cola = "something", my_obj = an_obj)
my_stuf = YetAnotherObj.objects.filter(cola = "something")

#BOOM (due to the clone())
my_stuf = my_stuf.filter(my_obj = an_obj)

The work around is to use the Primary Key

#in the signal
def after_my_obj_init(sender, instance, signal, *args, **kwargs):
	# try to fill in the instances 'other attributes' with some preset
	# querySets for use as properties later
	
	#the deepcopy in the of the Querset clone
	# will cause an intinite HERE (clone on the order_by)
	instance.attr_a = AnotherObj.objects.filter(
		my_obj__pk = instance.pk, 
		cola__contains = "Some Text"
	).order_by('cola')
	
	#No Infinite Recusrion as there is no Clone() called
	instance.attr_b = AnotherObj.objects.filter(
		my_obj__pk = instance.pk, 
		cola__contains = "Some Other Text"
	)

#in the other qset
my_stuf = my_stuf.filter(my_obj__pk = an_obj.pk)

As you can probably guess, this behavior is rather scary. Changing the "deepcopy" to a normal "copy" did seem to help the issue, but i imagine that will break other tests.

Attachments

Change History

06/10/08 10:58:01 changed by gav

  • keywords changed from QuerySet Refactor Infinite Recursion to qsrf-cleanup QuerySet Refactor Infinite Recursion.
  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

06/14/08 08:01:47 changed by Simon Greenhill

  • stage changed from Unreviewed to Accepted.

06/16/08 12:10:45 changed by jacob

  • milestone set to 1.0.

06/26/08 07:15:52 changed by mtredinnick

  • stage changed from Accepted to Design decision needed.

The deepcopy() call is absolutely needed, since each !Queryset clone must be independent, which means their where-clauses must not share references to internal data structures.

I'll play a bit with the __deepcopy__() implementation to try and not descend into certain types of objects, but if that slows things down too much (cloning !Querysets is already very expensive), this is going to be a case of hard cheese. The workaround is to set a flag on instance in you signal handler to say you are currently initialising it and check that flag before doing anything else (if the flag is set, just return). That way, you will only do the real work once and all the other calls will just return. It's really a problem with trying to use circular references to objects and since this is a pretty unusual sort of situation, having to do a little bit of planning ahead to avoid it won't be the end of the world. So if the attempt to fix it at the root level is too costly, I'm not going to spend too much effort on making that better, since there is a workaround.

06/26/08 07:28:26 changed by mtredinnick

  • owner changed from nobody to mtredinnick.

06/26/08 09:57:25 changed by bo <bo.blanton@gmail.com>

Most of these issues i'm finding tends to result from old (like 10 years old) legacy things and 'functions' that we are attempting to port into django .. I'm sure that by writing the app in django from the beginning, none of these things would appear, as our models would be a bit more 'robust', coherent, etc

But yes there certainly are workarounds

namely making these attribute setters properties of the Model Class, However, the purpose of the post_init was to 'add extra' attributes to other models in other applications

# in app1/models.py
class MyObj(models.Model):
	cola = models.Char...
	colb = models.Char...
	..
	attr_a = None # none DB attribute
	attr_b = None # none DB attribute

----------------
#in app2/models.py

class YetAnotherObj(models.Model):
	my_obj = models.ForeignKey(MyObj)
	cola = models.Char...
	colb = models.Char...

## define a post_init on MyObj here
def post_init_my_obj(sender, instance, signal, *args, **kwargs):
    instance.attr_a = {Some Query Set depending on YetAnotherObj}

dispatcher.connect(post_init_my_obj, signal=signals.post_init, sender=MyObj)

I imagine what you mean as the base work around is this

## define a post_init on MyObj here
def post_init_my_obj(sender, instance, signal, *args, **kwargs):
    if not hasattr(instance, '_set_attr_a'):
            instance.attr_a = {Some Query Set depending on YetAnotherObj}
            instance._set_attr_a = True

dispatcher.connect(post_init_my_obj, signal=signals.post_init, sender=MyObj)

I think i tried this at some point, but after a few cascades of the post_init (like add a YetAnotherObj? post_init, and more and more) the ordering, attributes and some things never got fully set. They would be in some instances, but not other depending on the entry point on the post_init and which app it came from.

So the real issues is that in the application splits "app2" may or may not exist/be loaded, etc. So the attributes set by app2 would be None .. so yes the work around, all be it ugly (as you need to know that app2 exists, at least in name), is this

# in app1/models.py
class MyObj(models.Model):
	cola = models.Char...
	colb = models.Char...
	..
        @property
        def attr_a(self):
              try:
                  from app2.models.import YetAnotherObj
                  return {Some query using YetAnotherObj}
              except ImportError:
                  return None

        @property
        def attr_b(self):
              try:
                     from app2.models.import YetAnotherObj
                     return {Some query using YetAnotherObj}
              except:
                     return None

06/26/08 22:20:16 changed by mtredinnick

I'm about to close this as part of a commit that I think fixes the root problem. We'll no longer be storing any reference to models or fields in the Where object in the normal course of things (people creating their own where-nodes might do so, but that's not something I can control). I think this will fix the cloning problems you're seeing.

Reopen if there's still a case that fails, but I think this is about the best we can hope for as a compromise between speed and robustness (it should actually now be completely robust, unless I've missed something subtle).

06/26/08 22:28:53 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

Fixed in [7773].


Add/Change #7128 (Infinite Recursion in new QuerySet Refactor when using a post_init signal)




Change Properties
Action