Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#18532 closed New feature (wontfix)

Mark when instance is loaded from database for pre-/post-init receivers.

Reported by: Vlastimil Zíma Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There is no way to find out if instance exists in database until initializing is complete. There is attribute Model._state.adding, but it is set after instance is initialized.

from django.db import models
from django.db.models import signals

class Parent(models.Model):
    pass

def callback(instance, **kwargs)
    print instance._state.adding

signals.post_init.connect(callback, sender=Parent)

p = Parent.objects.get(pk=1)
>>> True
p._state.adding
>>> False

Such flag can come handy to avoid subsequent SELECT queries to find out whether instance already exists in database.

Change History (5)

comment:1 by Anssi Kääriäinen, 12 years ago

Resolution: wontfix
Status: newclosed
Type: UncategorizedNew feature

Could you provide an use-case? I have been wondering if the pre/post_init signals are needed at all (instead of pre/post init hooks). So, I am interested in how the pre/post_init signals are used more generally.

This will not be easily done. The pre_init/post_init signals are sent from model.__init__ (and they have to be sent from there). The init API is part of public API. So, you can't add a "from_db" flag to __init__ to pass it to the signals.

If you can find a nice way to pass that information to the signals, then I don't see any harm in adding the information to the signals. However, I am closing this wontwix as I don't see any easy way to pass the information to the signals. If I am mistaken and there is a way to achieve this, then please reopen.

in reply to:  1 comment:2 by Vlastimil Zíma, 12 years ago

Replying to akaariai:

Could you provide an use-case? I have been wondering if the pre/post_init signals are needed at all (instead of pre/post init hooks). So, I am interested in how the pre/post_init signals are used more generally.

I tried to use signals post-save and post-init to create instances of related models (as model inheritance has very limited usage) e.g.

class Parent(models.Model):
    pass

class Child(models.Model):
    parent = models.OneToOneField(Parent, primary_key=True)

def post_save_callback(instance, created, raw, **kwargs)
    if instance.pk and created and not raw and not hasattr(instance, 'child'):
        Child.objects.get_or_create(parent=instance)

def post_init_callback(instance, created, raw, **kwargs)
    if instance.pk and not hasattr(instance, 'child'):
        Child.objects.get_or_create(parent=instance)

signals.post_save.connect(post_save_callback, sender=Parent)
# This one is required because save does not have to be run under transaction management and thus database does not have to end up in expected state.
signals.post_init.connect(post_init_callback, sender=Parent)

The get_or_create call in post-init always performs SELECT query, even if its is triggered as a result of database query itself.

Anyway, I do think there is more usage for flag which says that instance was loaded from database (like _state.adding), but is always set correctly.

This will not be easily done. The pre_init/post_init signals are sent from model.__init__ (and they have to be sent from there). The init API is part of public API. So, you can't add a "from_db" flag to __init__ to pass it to the signals.

If you can find a nice way to pass that information to the signals, then I don't see any harm in adding the information to the signals. However, I am closing this wontwix as I don't see any easy way to pass the information to the signals. If I am mistaken and there is a way to achieve this, then please reopen.

I do not think it is necessary to pass it as an argument to pre-/post-init signals, if instance._state.adding would be set properly.

Well, one way (not sure how nice it is) is to change constructor (I hope it is not part of public API):

class Model(object):
    __metaclass__ = ModelBase
    def __new__(cls, from_db=False):
        self = super(Model, cls).__new__(cls)
        self._state = ModelState()
        # Here could be also set `db` attribute of ModelState instance, which would made it real internal state, which is not tampered from outside of the instance. It would actually be more nice than now :)
        if from_db:
            self._state.adding = False
        return self

    @classmethod
    def from_db(cls, *args, **kwargs):
        """Nice initialize method to call from QuerySets"""
        self = cls.__new__(from_db=True)
        self.__init__(*args, **kwargs)
        return self

And second possible way I found is to set context like for transactions (this will be more tricky and IMHO less nice):

#In QuerySet:
    enter_from_db()
    try:
        model(*args, **kwargs)
    finally:
        leave_from_db()

#In Model.__init__:
    if is_from_db():
       self._state.adding = False
       # Or just pass the flag to .*-init signals.

I think there is a nice way to achieve this, but I let you to decide if the ticket should be reopened or similar one should be created.

comment:3 by Anssi Kääriäinen, 12 years ago

Does the first method actually work - it seems the __init__ would be called twice, and with the from_db argument first time. From Python docs:

If __new__() returns an instance of cls, then the new instance’s __init__()
method will be invoked like __init__(self[, ...]), where self is the new
instance and the remaining arguments are the same as were passed to
__new__().

The second method needs to use threadlocals and is thus not nice. In addition, in your use case you are initialising the child model from inside model.__init__ - so wouldn't the child model's __init__ see the model as coming from the DB even if it isn't in the DB.

My idea is this:

  1. Introduce Model._state_class, by default:
    class Model:
        _state_class = ModelState
    
  2. In model.init do self._state = self._state_class() instead of ModelState()
  3. When retrieving objects from the db (or after .save() or .delete()) do instance._state.set_state(instance, using, ...)

This would allow your use case by custom ModelState class. This would also allow doing nifty things in 3rd party extensions like tracking "dirty" status of model fields (save only changed fields, or skip save completely if no changes).

I already did this once, the performance cost is around 5-10% to model.__init__. I am willing to pay that price.

in reply to:  3 comment:4 by Vlastimil Zíma, 12 years ago

Replying to akaariai:

Does the first method actually work - it seems the __init__ would be called twice, and with the from_db argument first time. From Python docs:

If __new__() returns an instance of cls, then the new instance’s __init__()
method will be invoked like __init__(self[, ...]), where self is the new
instance and the remaining arguments are the same as were passed to
__new__().

I tried this in shell and it works as I expect (python 2.6.7 and 2.7.3rc2). It is weird that it does not work as documented.

class FooType(type):
    def __new__(cls, name, bases, attrs):
        print "FooType.__new__"
        return super(FooType, cls).__new__(cls, name, bases, attrs)

class Foo(object):
    __metaclass__ = FooType
    def __new__(cls, from_db=False):
        print "Foo.__new__", from_db
        return super(Foo, cls).__new__(cls)
    def __init__(self, *args, **kwargs):
        print "Foo.__init__", args, kwargs
        return super(Foo, self).__init__(*args, **kwargs)
    @classmethod
    def from_db(cls, *args, **kwargs):
        self = cls.__new__(cls, from_db=True)
        self.__init__(*args, **kwargs)
        return self
#>>> FooType.__new__
Foo()
#>>> Foo.__new__ False
#>>> Foo.__init__ () {}
#>>> <__main__.Foo object at 0xa365e2c>
Foo.from_db()
#>>> Foo.__new__ True
#>>> Foo.__init__ () {}
#>>> <__main__.Foo object at 0xa365e4c>

The second method needs to use threadlocals and is thus not nice. In addition, in your use case you are initialising the child model from inside model.__init__ - so wouldn't the child model's __init__ see the model as coming from the DB even if it isn't in the DB.

Good point. I agree that this is ugly.

My idea is this:

  1. Introduce Model._state_class, by default:
    class Model:
        _state_class = ModelState
    
  2. In model.init do self._state = self._state_class() instead of ModelState()
  3. When retrieving objects from the db (or after .save() or .delete()) do instance._state.set_state(instance, using, ...)

This will be definitely helpful, but I do not see how it helps my situation. My main objection is that you do not know the object's state during Model.__init__. This will not be changed unless you set the object's state in constructor because you can not pass state as a argument of __init__ itself.

I suggest this combined solution:

class Model:
    _state_class = ModelState
    def __new__(cls, db='default', adding=True, *args, **kwargs):
        """
        This allows passing state arguments into constructor.
        """
        self = super(Model, cls).__new__(cls)
        self._state = self._state_class(db, adding, *args, **kwargs)
        return self

    @classmethod
    def construct(cls, init_args, init_kwargs, db, adding, *args, **kwargs):
        """
        Nice wrapper for constructor.
        Arguments for `__init__` can not be passed in *args and **kwargs to avoid possible conflicts in argument names.
        """
        self = cls.__new__(cls, db, adding, *args, **kwargs)
        self.__init__(*init_args, **init_kwargs)
        return self

# In QuerySet:
model.construct(init_args, init_kwargs, db=db, adding=False)

comment:5 by Anssi Kääriäinen, 12 years ago

I think we need something more to use this than just the "from_db" flag. From the above it looks you are onto something. I wonder if deferred model (.only()/.defer()) loading could use the construct() method, too. If so, this could result in a solution that is overall cleaner than the current solution, and as an addition you would get your feature request :)

I did some performance testing (using https://github.com/akaariai/django/tree/model_new) and it seems the contruct() method is 10% slower in synthetic benchmark (1000x init model with three fields), and around 5% slower for DB fetch (1000x model with three fields).

I wonder if other core devs have any advice for this case. I am total newbie when it comes to __new__ trickery and I wonder if something evil is waiting at the end of that path.

This all being said I am still slightly opposed to this idea based on the work done in the github branch introduced above. But, I do hope this way of doing things allows for more cleanup and new interesting features. Investigating this path further seems like a good idea. Doing this all just for having _state.adding correctly set in the init signals isn't worth all this.

Maybe the correct approach is to introduce the construct() method without adding model.__new__. Then check if we want to do further additions. This would of course not solve your use case (unless you can override the contruct method), but I think this would lead to cleaner code overall.

Note: See TracTickets for help on using tickets.
Back to Top