Code


Version 2 (modified by mtredinnick, 8 years ago) (diff)

Fixed a small wiki formatting typo.

Model Creation and Initialisation

When some (user) code creates a subclass of models.Model, a lot goes on behind the scenes in order to present a reasonably normal looking Python class to the user, whilst providing the necessary database support when it comes time to save or load instances of the model. This document describes what goes on from the time a model is imported until an instance is created by the user.

A couple of preliminaries: Firstly, if you just want to develop systems with Django, this information is not required at all. It is here for people who might want to poke around the internals of Django or fix any bugs that may find.

Secondly, by the nature of the beast, class creation and initialisation delves a bit into the depths of Python's operation. A familiarity with the difference between a class and an instance, for example, will be helpful here. Section 3.2 of the Python Language Reference might be of assistance.

Unless otherwise noted, all of the files referred to here are in the source code tree in the django/db/models/ directory. For reference purposes, imagine that we have a models.py file containing the following model.

from django.db import models

class Example(models.Model):
    name = models.CharField(maxlength = 50)
    static = "I am a normal static string attribute"

    def __str__(self):
        return name

Importing A Model

From the moment a file containing a subclass of Model is imported, Django is involved.

In the process of parsing the file during import, the Python interpreter needs to create Example, our subclass. The Model class (see base.py) has a __metaclass__ attribute that defines ModelBase (also in base.py) as the class to use for creating new classes. So ModelBase.__new__ is called to create this new Example class. It is important to realise that we are creating the class object here, not an instance of it. In other words, Python is creating the thing that will eventually be bound to the Example name in our current namespace.

Metaprogramming -- overriding the normal class creation process -- is not very commonly used, so a quick description of what happens might be in order. ModelBase.__new__, like all __new__ methods is responsible for setting up any class-specific features. The more well-known __init__ method, on the other hand, is responsible for setting up instance-specific features. __new__ is passed the name of the new class as a string (Example in our case), any base classes (Model and object here) -- as actual class objects, so we can look inside them, examine types and so forth -- and a dictionary of attributes that need to be installed. This attribute dictionary includes all the attributes and methods from the new class (Example) as well as the parent classes (e.g. Model). So it includes all of the utility functions from the Model class like save() and delete() as well as the new field objects we are creating in Example.

The __new__ method is required to return a class object that can then be instantiated (by calling Example() in our case). If you are interested in seeing more examples of this, the Python Cookbook from O'Reilly has a whole chapter on metaprogramming.

Installing The Attributes

Now things start to get interesting. A new class object with the Example name is created in the right module namespace. A _meta attribute is added to hold all of the field validation, retrieval and saving machinery. This is an Options class from options.py. Initially, it takes values from the inner Meta class, if any, on the new model, using defaults if Meta is not declared (as in our example).

Each attribute is then added to the new class object. This is done by calling Model.add_to_class with the attribute name and object. This object could be something like a normal unbound method, or a string, or a property, or -- of most relevance here -- a Field subclass that we want to do something with. The add_to_class() method checks to see if the object has a contribute_to_class method that can be called. If it doesn't, this is just a normal attribute and it is added to the new class object with the given name. If we do have a contribute_to_class method on the new attribute object, this method is called and given a reference to the new class along with the name of the new attribute.

Putting this in the context of our example, the static and __str__ attributes would be added normally to the class as a string object and unbound method, respectively. When the name attribute is considered, add_to_class would be called with the string name and an instance of the models.CharField class (which is defined in fields/__init__.py). Note that we are passed an instance of CharField, not the class itself. The object we are passed in add_to_class has already been created. So that object knows it has a maxlength of 50 in our case. And that the default verbose name is not being overridden and so on. CharField instances have a contribute_to_class method, so that will be called and passed the new Example class object and the string name (the attribute name that we are creating).

When Field.contribute_to_class (or one of the similar methods in a subclass of Field, it does not add the new attribute to the class we are creating (Example). Instead it adds itself to the Example._meta class, ending up in the Example._meta.fields list. If you are interested in the details you can read the code in fields/__init__.py and I am glossing over the case of relation fields like ForeignKey and ManyToManyField here. But the principle is the same for all fields. The important thing to realise is that they are not added as attributes on the main class, but, rather, they are stored in the _meta attribute and will be called upon at save or load or delete time.

There are no attributes on the final class object for the model fields (the things that are derived from Field, that is). These attributes are only added in when __init__() is called, which is discussed below.

Preparing The Class

Once all of the attributes have been added, the class creation is just about finished. The Model._prepare method is called. This sets up a few model methods that might be required depending on other options you have selected and adds a primary key field if one has not been explicitly declared. Finally, the class_prepared signal is emitted and any registered handlers run.

The main beneficiary of receiving class_prepared at the moment is manipulators.py. It catches this signal and uses it to add default add- and change-manipulators to the class.

Registering The Class

Once Django has created the new class object, a copy is saved in an in-memory cache (a Python dictionary) so that it can be retrieved by other classes if needed. This is required so that things like reverse lookups for related fields can be performed (see Options.get_all_related_objects() for example).

The registration itself is done right at the end of the ModelBase.__new__ method (recall that we still have not completely finished this method and have not returned from it yet). The register_models() function in loaders.py is called and is passed the name of the Django application this model belongs to and the new class object we are creating via __new__. However, it is not quite as simple as saving a reference to this class...

Files can be imported into Python in many different ways. We have

from application.models import Example

or

from application import models

myExample = models.Example()

or even

from project.application import models

Each of these imports are really importing exactly the same model. However, due to the way importing is implemented in Python, the last case, particularly is not usually identified as the same import as the first two cases. In the last case, the models module has a name of project.application.models, whilst in the first two cases it is called application.models.

This might not be a big problem if we already used consistent import statements. However, it is convenient to be able to leave off the project name inside an application, so that we can move the application between projects. Also, although we might import something from a particular path, if Django imports it as part of working out all the installed models (which it has to do to work out reverse related fields, again), it might be imported with a slightly different name. This "slightly different name" is used to derive the application name and so, if we are not careful, we might end up registering the same model under two or more different application names: such as registering Example under application and project.application, even though it is the same Example class. And this leads to problems down the line, because if the reverse relations are computed as referring to the project.application Example class and we happen to have a copy of the application Example class in our code or shell, things do not work.

If the previous paragraph seemed a bit confusing, just bear in mind that we only want to register each model exactly once, regardless of its import path, providing we can tell that it is the same model.

We can work out whether a model is the same as one we already registered by looking at the source filename (which you can retrieve via Python's introspection facilities: see register_model() for the details, if you care). So register_model is careful not to register a model more than once. If it is called a second time for the same model, it just does nothing.

So we have created the class object and registered it. There is one last subtlety to take care of: if we are creating this model for the second time for some reason -- for example, due to a second import via a slightly different path -- we do not want to return out new object to the user. This will lead to the problem described above of one class object being used to compute things like related fields and another object being given to the user to work with; the latter will not have all the right information. The effects of making this mistake (and the difficulties in diagnosing it) are well illustrated in ticket #1796. Instead of always returning the new object we have created, we return whatever object is registered for this model by looking it up via get_model() in loadings.py. If this is the first time this class object has been created, we end up returning the one we just created. If it is the second time, then the call to register_model threw away our newly created object in favour of the first instance. So we return that first instance.

In this way, by being very careful at model construction, we only ever end up with one instance of each class object. Since every Model sub-class must call ModelBase.__new__ when it is first created, we will always be able to catch the duplicates (unless there is a bug in the duplicate detection code in register_model() :-) ).

After all this work, Python can then take the new object and bind it to the name of the class -- Example for us. There it sits quietly until somebody wants to create an instance of the Example class.

Creating A Model Instance

When a piece of Python code runs something like

e1 = Example(name = 'Fred')

the Example.__init__() method is called. As mentioned above, this sets up the instance-specific features of the class and returns a Python class instance type.

Fortunately, after the hard work of understanding how the Example class was created, understanding the __init__() method is much easier. We really only need to consider the Model.__init__ method here, since it is assumed that any subclass with its own __init__() will eventually call the superclass __init__() method as well. We can boil the whole process down to a few reasonably simple steps.

  1. Emit the pre_init signal. This is caught by any code that might want to attach to the new instance early on. Currently the GenericForeignKey class uses this signal to do some setting up (see GenericForeignKey.instance_pre_init() in fields/generic.py).
  2. Run through all of the fields in the model and create attributes for each one (recall that the class object does not have these attributes as the field instances have all been put into the _meta attribute):
    1. Because assigning to a many-to-many relation involves a secondary database table (the join table), initialising these fields requires special handling. So if a keyword argument of that name is passed in, it is assigned to the corresponding instance attribute after the necessary processing.
    2. For normal field attributes, an attribute on the new instance is created that contains either the value passed into __init__() or the default value for that field.
  3. For any keyword arguments in the __init__() that remain unprocessed, check to see if there is a property on the class with the same name that can be set and, if so, call that with the passed in value.
  4. If any keyword arguments remain unprocessed, raise an AttributeError exception, because the class cannot handle them and the programmer has made a mistake.
  5. Run through any positional arguments that have been passed in and try to assign them to field attributes in the order that the fields appear in the class declaration.
  6. Emit the post_init signal. Nothing in Django's core uses this signal, but it can be useful for code built on top of Django that would like to hook into particular class instances prior to the creator receiving the instance back again.

At the end of these six steps, we have a normal Python class with an attribute for each field (along with all the normal non-field attributes and methods). Each of these field attributes just holds a normal Python type, rather than being any kind of special class (the exception here are relation fields, again, such as many-to-many relations). Other methods in the class can work with these attributes normally, as well as access the Field subclasses that control them via the _meta attribute on the instance.


Malcolm Tredinnick

June, 2006