Code

Opened 22 months ago

Closed 22 months ago

Last modified 22 months ago

#18532 closed New feature (wontfix)

Mark when instance is loaded from database for pre-/post-init receivers.

Reported by: vzima Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There is no way to find out if instance exists in database until initializing is complete. There is attribute Model._state.adding, but it is set after instance is initialized.

from django.db import models
from django.db.models import signals

class Parent(models.Model):
    pass

def callback(instance, **kwargs)
    print instance._state.adding

signals.post_init.connect(callback, sender=Parent)

p = Parent.objects.get(pk=1)
>>> True
p._state.adding
>>> False

Such flag can come handy to avoid subsequent SELECT queries to find out whether instance already exists in database.

Attachments (0)

Change History (5)

comment:1 follow-up: Changed 22 months ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to wontfix
  • Status changed from new to closed
  • Type changed from Uncategorized to New feature

Could you provide an use-case? I have been wondering if the pre/post_init signals are needed at all (instead of pre/post init hooks). So, I am interested in how the pre/post_init signals are used more generally.

This will not be easily done. The pre_init/post_init signals are sent from model.__init__ (and they have to be sent from there). The init API is part of public API. So, you can't add a "from_db" flag to __init__ to pass it to the signals.

If you can find a nice way to pass that information to the signals, then I don't see any harm in adding the information to the signals. However, I am closing this wontwix as I don't see any easy way to pass the information to the signals. If I am mistaken and there is a way to achieve this, then please reopen.

comment:2 in reply to: ↑ 1 Changed 22 months ago by vzima

Replying to akaariai:

Could you provide an use-case? I have been wondering if the pre/post_init signals are needed at all (instead of pre/post init hooks). So, I am interested in how the pre/post_init signals are used more generally.

I tried to use signals post-save and post-init to create instances of related models (as model inheritance has very limited usage) e.g.

class Parent(models.Model):
    pass

class Child(models.Model):
    parent = models.OneToOneField(Parent, primary_key=True)

def post_save_callback(instance, created, raw, **kwargs)
    if instance.pk and created and not raw and not hasattr(instance, 'child'):
        Child.objects.get_or_create(parent=instance)

def post_init_callback(instance, created, raw, **kwargs)
    if instance.pk and not hasattr(instance, 'child'):
        Child.objects.get_or_create(parent=instance)

signals.post_save.connect(post_save_callback, sender=Parent)
# This one is required because save does not have to be run under transaction management and thus database does not have to end up in expected state.
signals.post_init.connect(post_init_callback, sender=Parent)

The get_or_create call in post-init always performs SELECT query, even if its is triggered as a result of database query itself.

Anyway, I do think there is more usage for flag which says that instance was loaded from database (like _state.adding), but is always set correctly.

This will not be easily done. The pre_init/post_init signals are sent from model.__init__ (and they have to be sent from there). The init API is part of public API. So, you can't add a "from_db" flag to __init__ to pass it to the signals.

If you can find a nice way to pass that information to the signals, then I don't see any harm in adding the information to the signals. However, I am closing this wontwix as I don't see any easy way to pass the information to the signals. If I am mistaken and there is a way to achieve this, then please reopen.

I do not think it is necessary to pass it as an argument to pre-/post-init signals, if instance._state.adding would be set properly.

Well, one way (not sure how nice it is) is to change constructor (I hope it is not part of public API):

class Model(object):
    __metaclass__ = ModelBase
    def __new__(cls, from_db=False):
        self = super(Model, cls).__new__(cls)
        self._state = ModelState()
        # Here could be also set `db` attribute of ModelState instance, which would made it real internal state, which is not tampered from outside of the instance. It would actually be more nice than now :)
        if from_db:
            self._state.adding = False
        return self

    @classmethod
    def from_db(cls, *args, **kwargs):
        """Nice initialize method to call from QuerySets"""
        self = cls.__new__(from_db=True)
        self.__init__(*args, **kwargs)
        return self

And second possible way I found is to set context like for transactions (this will be more tricky and IMHO less nice):

#In QuerySet:
    enter_from_db()
    try:
        model(*args, **kwargs)
    finally:
        leave_from_db()

#In Model.__init__:
    if is_from_db():
       self._state.adding = False
       # Or just pass the flag to .*-init signals.

I think there is a nice way to achieve this, but I let you to decide if the ticket should be reopened or similar one should be created.

comment:3 follow-up: Changed 22 months ago by akaariai

Does the first method actually work - it seems the __init__ would be called twice, and with the from_db argument first time. From Python docs:

If __new__() returns an instance of cls, then the new instance’s __init__()
method will be invoked like __init__(self[, ...]), where self is the new
instance and the remaining arguments are the same as were passed to
__new__().

The second method needs to use threadlocals and is thus not nice. In addition, in your use case you are initialising the child model from inside model.__init__ - so wouldn't the child model's __init__ see the model as coming from the DB even if it isn't in the DB.

My idea is this:

  1. Introduce Model._state_class, by default:
    class Model:
        _state_class = ModelState
    
  2. In model.init do self._state = self._state_class() instead of ModelState()
  3. When retrieving objects from the db (or after .save() or .delete()) do instance._state.set_state(instance, using, ...)

This would allow your use case by custom ModelState class. This would also allow doing nifty things in 3rd party extensions like tracking "dirty" status of model fields (save only changed fields, or skip save completely if no changes).

I already did this once, the performance cost is around 5-10% to model.__init__. I am willing to pay that price.

comment:4 in reply to: ↑ 3 Changed 22 months ago by vzima

Replying to akaariai:

Does the first method actually work - it seems the __init__ would be called twice, and with the from_db argument first time. From Python docs:

If __new__() returns an instance of cls, then the new instance’s __init__()
method will be invoked like __init__(self[, ...]), where self is the new
instance and the remaining arguments are the same as were passed to
__new__().

I tried this in shell and it works as I expect (python 2.6.7 and 2.7.3rc2). It is weird that it does not work as documented.

class FooType(type):
    def __new__(cls, name, bases, attrs):
        print "FooType.__new__"
        return super(FooType, cls).__new__(cls, name, bases, attrs)

class Foo(object):
    __metaclass__ = FooType
    def __new__(cls, from_db=False):
        print "Foo.__new__", from_db
        return super(Foo, cls).__new__(cls)
    def __init__(self, *args, **kwargs):
        print "Foo.__init__", args, kwargs
        return super(Foo, self).__init__(*args, **kwargs)
    @classmethod
    def from_db(cls, *args, **kwargs):
        self = cls.__new__(cls, from_db=True)
        self.__init__(*args, **kwargs)
        return self
#>>> FooType.__new__
Foo()
#>>> Foo.__new__ False
#>>> Foo.__init__ () {}
#>>> <__main__.Foo object at 0xa365e2c>
Foo.from_db()
#>>> Foo.__new__ True
#>>> Foo.__init__ () {}
#>>> <__main__.Foo object at 0xa365e4c>

The second method needs to use threadlocals and is thus not nice. In addition, in your use case you are initialising the child model from inside model.__init__ - so wouldn't the child model's __init__ see the model as coming from the DB even if it isn't in the DB.

Good point. I agree that this is ugly.

My idea is this:

  1. Introduce Model._state_class, by default:
    class Model:
        _state_class = ModelState
    
  2. In model.init do self._state = self._state_class() instead of ModelState()
  3. When retrieving objects from the db (or after .save() or .delete()) do instance._state.set_state(instance, using, ...)

This will be definitely helpful, but I do not see how it helps my situation. My main objection is that you do not know the object's state during Model.__init__. This will not be changed unless you set the object's state in constructor because you can not pass state as a argument of __init__ itself.

I suggest this combined solution:

class Model:
    _state_class = ModelState
    def __new__(cls, db='default', adding=True, *args, **kwargs):
        """
        This allows passing state arguments into constructor.
        """
        self = super(Model, cls).__new__(cls)
        self._state = self._state_class(db, adding, *args, **kwargs)
        return self

    @classmethod
    def construct(cls, init_args, init_kwargs, db, adding, *args, **kwargs):
        """
        Nice wrapper for constructor.
        Arguments for `__init__` can not be passed in *args and **kwargs to avoid possible conflicts in argument names.
        """
        self = cls.__new__(cls, db, adding, *args, **kwargs)
        self.__init__(*init_args, **init_kwargs)
        return self

# In QuerySet:
model.construct(init_args, init_kwargs, db=db, adding=False)

comment:5 Changed 22 months ago by akaariai

I think we need something more to use this than just the "from_db" flag. From the above it looks you are onto something. I wonder if deferred model (.only()/.defer()) loading could use the construct() method, too. If so, this could result in a solution that is overall cleaner than the current solution, and as an addition you would get your feature request :)

I did some performance testing (using https://github.com/akaariai/django/tree/model_new) and it seems the contruct() method is 10% slower in synthetic benchmark (1000x init model with three fields), and around 5% slower for DB fetch (1000x model with three fields).

I wonder if other core devs have any advice for this case. I am total newbie when it comes to __new__ trickery and I wonder if something evil is waiting at the end of that path.

This all being said I am still slightly opposed to this idea based on the work done in the github branch introduced above. But, I do hope this way of doing things allows for more cleanup and new interesting features. Investigating this path further seems like a good idea. Doing this all just for having _state.adding correctly set in the init signals isn't worth all this.

Maybe the correct approach is to introduce the construct() method without adding model.__new__. Then check if we want to do further additions. This would of course not solve your use case (unless you can override the contruct method), but I think this would lead to cleaner code overall.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.