= Model Creation and Initialisation = When some (user) code creates a subclass of models.Model, a lot goes on behind the scenes in order to present a reasonably normal looking Python class to the user, whilst providing the necessary database support when it comes time to save or load instances of the model. This document describes what goes on from the time a model is imported until an instance is created by the user. A couple of preliminaries: Firstly, if you just want to develop systems with Django, this information is not required at all. It is here for people who might want to poke around the internals of Django or fix any bugs that may find. Secondly, by the nature of the beast, class creation and initialisation delves a bit into the depths of Python's operation. A familiarity with the difference between a class and an instance, for example, will be helpful here. Section 3.2 of the [http://docs.python.org/ref/types.html Python Language Reference] might be of assistance. Unless otherwise noted, all of the files referred to here are in the source code tree in the {{{django/db/models/}}} directory. For reference purposes, imagine that we have a {{{models.py}}} file containing the following model. {{{ #!python from django.db import models class Example(models.Model): name = models.CharField(maxlength = 50) static = "I am a normal static string attribute" def __str__(self): return name }}} == Importing A Model == From the moment a file containing a subclass of {{{Model}}} is imported, Django is involved. In the process of parsing the file during import, the Python interpreter needs to create {{{Example}}}, our subclass. The {{{Model}}} class (see {{{base.py}}}) has a {{{__metaclass__}}} attribute that defines {{{ModelBase}}} (also in {{{base.py}}}) as the class to use for creating new classes. So {{{ModelBase.__new__}}} is called to create this new {{{Example}}} class. It is important to realise that we are creating the ''class object'' here, not an instance of it. In other words, Python is creating the thing that will eventually be bound to the ''Example'' name in our current namespace. Metaprogramming -- overriding the normal class creation process -- is not very commonly used, so a quick description of what happens might be in order. {{{ModelBase.__new__}}}, like all {{{__new__}}} methods is responsible for setting up any class-specific features. The more well-known {{{__init__}}} method, on the other hand, is responsible for setting up instance-specific features. {{{__new__}}} is passed the name of the new class as a string (''Example'' in our case), any base classes ({{{Model}}} and {{{object}}} here) -- as actual class objects, so we can look inside them, examine types and so forth -- and a dictionary of attributes that need to be installed. This attribute dictionary includes all the attributes and methods from the new class (''Example'') as well as the parent classes (e.g. ''Model''). So it includes all of the utility functions from the {{{Model}}} class like {{{save()}}} and {{{delete()}}} as well as the new field objects we are creating in {{{Example}}}. The {{{__new__}}} method is required to return a class object that can then be instantiated (by calling {{{Example()}}} in our case). If you are interested in seeing more examples of this, the [http://www.oreilly.com/catalog/pythoncook/ Python Cookbook] from O'Reilly has a whole chapter on metaprogramming. === Installing The Attributes === Now things start to get interesting. A new class object with the ''Example'' name is created in the right module namespace. A {{{_meta}}} attribute is added to hold all of the field validation, retrieval and saving machinery. This is an {{{Options}}} class from {{{options.py}}}. Initially, it takes values from the inner {{{Meta}}} class, if any, on the new model, using defaults if {{{Meta}}} is not declared (as in our example). Each attribute is then added to the new class object. This is done by calling {{{Model.add_to_class}}} with the attribute name and object. This object could be something like a normal unbound method, or a string, or a property, or -- of most relevance here -- a {{{Field}}} subclass that we want to do something with. The {{{add_to_class()}}} method checks to see if the object has a {{{contribute_to_class}}} method that can be called. If it doesn't, this is just a normal attribute and it is added to the new class object with the given name. If we do have a {{{contribute_to_class}}} method on the new attribute object, this method is called and given a reference to the new class along with the name of the new attribute. Putting this in the context of our example, the ''static'' and ''!__str!__'' attributes would be added normally to the class as a string object and unbound method, respectively. When the ''name'' attribute is considered, {{{add_to_class}}} would be called with the string ''name'' and an instance of the {{{models.CharField}}} class (which is defined in {{{fields/__init__.py}}}). Note that we are passed an ''instance'' of {{{CharField}}}, not the class itself. The object we are passed in {{{add_to_class}}} has already been created. So that object knows it has a ''maxlength'' of 50 in our case. And that the default verbose name is not being overridden and so on. {{{CharField}}} instances have a {{{contribute_to_class}}} method, so that will be called and passed the new {{{Example}}} class object and the string ''name'' (the attribute name that we are creating). When {{{Field.contribute_to_class}}} (or one of the similar methods in a subclass of {{{Field}}}, it does not add the new attribute to the class we are creating (''Example''). Instead it adds itself to the {{{Example._meta}}} class, ending up in the {{{Example._meta.fields}}} list. If you are interested in the details you can read the code in {{{fields/__init__.py}}} and I am glossing over the case of relation fields like {{{ForeignKey}}} and {{{ManyToManyField}}} here. But the principle is the same for all fields. The important thing to realise is that they are not added as attributes on the main class, but, rather, they are stored in the {{{_meta}}} attribute and will be called upon at save or load or delete time. There are no attributes on the final class object for the model fields (the things that are derived from {{{Field}}}, that is). These attributes are only added in when {{{__init__()}}} is called, which is discussed below. === Preparing The Class === Once all of the attributes have been added, the class creation is just about finished. The {{{Model._prepare}}} method is called. This sets up a few model methods that might be required depending on other options you have selected and adds a primary key field if one has not been explicitly declared. Finally, the ''class_prepared'' signal is emitted and any registered handlers run. The main beneficiary of receiving ''class_prepared'' at the moment is {{{manipulators.py}}}. It catches this signal and uses it to add default add- and change-manipulators to the class. === Registering The Class === Once Django has created the new class object, a copy is saved in an in-memory cache (a Python dictionary) so that it can be retrieved by other classes if needed. This is required so that things like ''reverse lookups'' for related fields can be performed (see {{{Options.get_all_related_objects()}}} for example). The registration itself is done right at the end of the {{{ModelBase.__new__}}} method (recall that we still have not completely finished this method and have not returned from it yet). The {{{register_models()}}} function in {{{loaders.py}}} is called and is passed the name of the Django application this model belongs to and the new class object we are creating via {{{__new__}}}. However, it is not quite as simple as saving a reference to this class... Files can be imported into Python in many different ways. We have {{{ #!python from application.models import Example }}} or {{{ #!python from application import models myExample = models.Example() }}} or even {{{ #!python from project.application import models }}} Each of these imports are really importing exactly the same model. However, due to the way importing is implemented in Python, the last case, particularly is not usually identified as the same import as the first two cases. In the last case, the {{{models}} module has a name of {{{project.application.models}}}, whilst in the first two cases it is called {{{application.models}}}. This might not be a big problem if we already used consistent import statements. However, it is convenient to be able to leave off the project name inside an application, so that we can move the application between projects. Also, although we might import something from a particular path, if Django imports it as part of working out all the installed models (which it has to do to work out reverse related fields, again), it might be imported with a slightly different name. This "slightly different name" is used to derive the application name and so, if we are not careful, we might end up registering the same model under two or more different application names: such as registering Example under ''application'' and ''project.application'', even though it is the same {{{Example}}} class. And this leads to problems down the line, because if the reverse relations are computed as referring to the ''project.application'' {{{Example}}} class and we happen to have a copy of the ''application'' {{{Example}}} class in our code or shell, things do not work. If the previous paragraph seemed a bit confusing, just bear in mind that we only want to register each model exactly once, regardless of its import path, providing we can tell that it is the same model. We can work out whether a model is the same as one we already registered by looking at the source filename (which you can retrieve via Python's introspection facilities: see {{{register_model()}}} for the details, if you care). So {{{register_model}}} is careful not to register a model more than once. If it is called a second time for the same model, it just does nothing. So we have created the class object and registered it. There is one last subtlety to take care of: if we are creating this model for the second time for some reason -- for example, due to a second import via a slightly different path -- we do not want to return out new object to the user. This will lead to the problem described above of one class object being used to compute things like related fields and another object being given to the user to work with; the latter will not have all the right information. The effects of making this mistake (and the difficulties in diagnosing it) are well illustrated in ticket #1796. Instead of always returning the new object we have created, we return whatever object is registered for this model by looking it up via {{{get_model()}}} in {{{loadings.py}}}. If this is the first time this class object has been created, we end up returning the one we just created. If it is the second time, then the call to {{{register_model}}} threw away our newly created object in favour of the first instance. So we return that first instance. In this way, by being very careful at model construction, we only ever end up with one instance of each class object. Since every {{{Model}}} sub-class must call {{{ModelBase.__new__}}} when it is first created, we will always be able to catch the duplicates (unless there is a bug in the duplicate detection code in {{{register_model()}}} :-) ). After all this work, Python can then take the new object and bind it to the name of the class -- ''Example'' for us. There it sits quietly until somebody wants to create an instance of the {{{Example}}} class. == Creating A Model Instance == When a piece of Python code runs something like {{{ #!python e1 = Example(name = 'Fred') }}} the {{{Example.__init__()}}} method is called. As mentioned above, this sets up the ''instance-specific'' features of the class and returns a Python class instance type. Fortunately, after the hard work of understanding how the {{{Example}}} class was created, understanding the {{{__init__()}}} method is much easier. We really only need to consider the {{{Model.__init__}}} method here, since it is assumed that any subclass with its own {{{__init__()}}} will eventually call the superclass {{{__init__()}}} method as well. We can boil the whole process down to a few reasonably simple steps. 1. Emit the ''pre_init'' signal. This is caught by any code that might want to attach to the new instance early on. Currently the {{{GenericForeignKey}}} class uses this signal to do some setting up (see {{{GenericForeignKey.instance_pre_init()}}} in {{{fields/generic.py}}}). 2. Run through all of the fields in the model and create attributes for each one (recall that the class object does not have these attributes as the field instances have all been put into the {{{_meta}}} attribute): 1. Because assigning to a many-to-many relation involves a secondary database table (the join table), initialising these fields requires special handling. So if a keyword argument of that name is passed in, it is assigned to the corresponding instance attribute after the necessary processing. 2. For normal field attributes, an attribute on the new instance is created that contains either the value passed into {{{__init__()}}} or the default value for that field. 3. For any keyword arguments in the {{{__init__()}}} that remain unprocessed, check to see if there is a property on the class with the same name that can be set and, if so, call that with the passed in value. 4. If any keyword arguments remain unprocessed, raise an !AttributeError exception, because the class cannot handle them and the programmer has made a mistake. 5. Run through any positional arguments that have been passed in and try to assign them to field attributes in the order that the fields appear in the class declaration. 6. Emit the ''post_init'' signal. Nothing in Django's core uses this signal, but it can be useful for code built on top of Django that would like to hook into particular class instances prior to the creator receiving the instance back again. At the end of these six steps, we have a normal Python class with an attribute for each field (along with all the normal non-field attributes and methods). Each of these field attributes just holds a normal Python type, rather than being any kind of special class (the exception here are relation fields, again, such as many-to-many relations). Other methods in the class can work with these attributes normally, as well as access the {{{Field}}} subclasses that control them via the {{{_meta}}} attribute on the instance. ---- Malcolm Tredinnick June, 2006