= Model Inheritance = This is a proposal for how subclassing should work in Django. There are a lot of details to get right, so this proposal should be very specific and detailed. Most of the ideas here come from the thread linked below: [http://groups.google.com/group/django-developers/browse_frm/thread/ea5e0bf903058fac/9a68ac0d99cb6d7d?q=semantics&rnum=1#9a68ac0d99cb6d7d django-developers: Semantics of subtyping] Here are some additional notes from Robert Wittams on allowing different storage models. [http://groups.google.com/group/django-developers/browse_frm/thread/9a9a2cfacce690d/dfcd5a71ba523534?q=Just+a+few+vague+notes&rnum=1#dfcd5a71ba523534 django-developers: Change subclassing syntax] == Current Status == Malcolm Tredinnick is working on an implementation of this. As of 15 July 2006, it is on hold whilst a refactoring of Django's query generation is taking place (see [http://groups.google.com/group/django-developers/browse_frm/thread/1026181083385c58/# this message] on django-devel). Once that work is completed, work will resume again. Work seems to have resumed. [http://groups.google.com/group/django-developers/browse_thread/thread/1bc01ee32cfb7374/84b64c625cd1b402?tvc=2&q=Model+Inheritance#84b64c625cd1b402 django-developers: Model inheritance API] == Subclassing == For subclassing, there are 3 main issues: 1. How do we model the relations in SQL? 2. How do joins work? 3. How does the API work? Note that I have only provided examples for single inheritance here. Is multiple inheritance worth supporting? The examples will use the following models: {{{ #!python class Place(models.Model): name = models.CharField(maxlength=50) class Restaurant(Place): description = models.TextField() class ItalianRestaurant(Restaurant): has_decent_gnocchi = models.BooleanField() }}} === 1. Modeling parent relations in SQL? === The general consesus seems to be this: {{{ #!sql CREATE TABLE "myapp_place" ( "id" integer NOT NULL PRIMARY KEY, "name" varchar(50) NOT NULL ); CREATE TABLE "myapp_restaurant" ( /* PRIMARY KEY REFERENCES "myapp_places" ("id") works for postgres, what about others? */ /* This works in MySQL too. -- Andy Dustman */ "id" integer NOT NULL PRIMARY KEY REFERENCES "myapp_places" ("id"), "description" text NOT NULL ); CREATE TABLE "myapp_italianrestaurant" ( "id" integer NOT NULL PRIMARY KEY REFERENCES "myapp_restaurant" ("id"), "has_decent_gnocchi" bool NOT NULL ); }}} '' - Wonder what rationale was for choosing this approach? Collapsing into a single table (with discriminator) would alleviate the join problems. It would make for wide tables if the hierarchy was deep and each subtype added numerous attributes, but it would be hidden underneath the API anyway. Just wonderin... - S '' ''A single table is how Ruby on Rails handles it. It does avoid the join problems, but your table could end up being pretty sparse... a lot of empty cells. (I know, but disk space is cheap... blah blah) The bigger reasons are that it makes it impossible to enforce NOT NULL constraints at the database level, and it may be easier to run into locking issues. There are generally 3 different ways to model inheritance for an ORM, this method offers the most "correctness" (if you put your dba glasses on) and the lowest performance for selects... That said, I don't think we're dead set on implementing things this way, but this *is* the way most people I've talked to are leaning.'' ''Here are a few references for those interested:'' * http://www.objectmatter.com/vbsf/docs/maptool/ormapping.html * http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html * http://www.martinfowler.com/eaaCatalog/classTableInheritance.html * http://www.martinfowler.com/eaaCatalog/concreteTableInheritance.html ''-Joseph Kocherhans'' ''Thanks Joseph. Agree it's not an obvious choice. The Null thing is important, although wonder whether it's more important than performant queries... a bit philosophical that one: is it better to ensure cleaner data in the db or provide more performant code. It could be dealt with using triggers, but I know that then gets away from declarative constraints into procedural programming in the DBs. Aside from the different syntax, not all DBs even support triggers (sqlite?). Stepping back, it's possible (likely) there will be situations where one or the other is preferred. It should be possible to allow for that by abstracting the approach into a Strategy (per Strategy pattern). Even though there's only one strategy provided initially, it would allow the (an-) other to be added later. Wonder if that's something you're considering? - S '' === 2. Modeling joins in SQL === When we want a list of {{{ItalianRestaurant}}}s, we obviously need all the fields from myapp_restaurant and myapp_place as well. This could be accomplished by inner joins. It would look something like this: {{{ #!sql SELECT ... FROM myapp_italianrestaurant as ir INNER JOIN myapp_restaurant as r ON ir.restaurant_id=r.id INNER JOIN myapp_place as p ON r.place_id=p.id }}} But what if we want a list of {{{Place}}}s, what should we do? We can either ''just'' get the places: {{{ #!sql SELECT ... FROM myapp_place }}} Or we can get ''everything'' with left joins (this allows the iterator to return objects of the appropriate type, rather than just a bunch of {{{Places}}}): {{{ #!sql SELECT ... FROM myapp_place as p LEFT JOIN myapp_restaurant as r ON r.place_id=p.id LEFT JOIN myapp_italianrestaurant as ir ON ir.restaurant_id=r.id }}} Imagine we have more than one subclass of {{{Place}}} though. The join clause and the column list would get pretty hefty. This could obviously get unmanageable pretty quickly. '' I think some dbs have a maximum number of joins (something like 16), and even within the maximum, the query optimizer will either spend a while deciding which way to best join the tables or it will give up and choose the wrong way quickly. This wording is FUD-- I'll try to find specifics. --jdunck '' '' MySQL-4.1 and newer can handle up to 61 tables in a JOIN or VIEW (5.0 and newer). Unclear what the limit is for 4.0 and older. -- Andy Dustman '' '' There must be major performance problems with performing that many joins in a query. What's wrong with making the default behaviour to grab only the base fields, and documenting that? (except for the fact that subclass-specific methods might break .. hm.) --harmless '' '' Another possibility is to create VIEWs: '' {{{ #!sql CREATE TABLE "myapp_place" ( "id" integer NOT NULL PRIMARY KEY, "name" varchar(50) NOT NULL ); CREATE TABLE "myapp_restaurant_base" ( "id" integer NOT NULL PRIMARY KEY REFERENCES "myapp_places" ("id"), "description" text NOT NULL ); CREATE VIEW "myapp_restaurant" AS SELECT a."id", a."name", b."description" FROM "myapp_place" AS a LEFT JOIN "myapp_restaurant_base" AS b ON a."id"=b."id" ; CREATE TABLE "myapp_italianrestaurant_base" ( "id" integer NOT NULL PRIMARY KEY REFERENCES "myapp_restaurant" ("id"), "has_decent_gnocchi" bool NOT NULL ); CREATE VIEW "myapp_italianrestaurant" AS SELECT a."id", a."name", b."description", c."has_decent_gnocchi" FROM "myapp_place" AS a LEFT JOIN "myapp_restaurant_base" AS b ON a."id"=b."id" LEFT JOIN "myapp_italianrestaurant_base" AS c on a."id"=c."id" ; }}} In MySQL-5.0, most of the theoretically-updatable VIEWs are updatable, but currently, "You cannot use UPDATE to update more than one underlying table of a view that is defined as a join." so unfortunately this doesn't work (yet): {{{ mysql> INSERT INTO "myapp_italianrestaurant" ("name","description","has_decent_gnocchi") -> VALUES ('DePalma\'s', 'Fine Italian Dining', 1); ERROR 1288 (HY000): The target table myapp_italianrestaurant of the INSERT is not updatable }}} '' However, it is possible to do this: '' {{{ mysql> INSERT INTO "myapp_place" ("id","name") VALUES (1,'DePalma\'s'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO "myapp_restaurant_base" ("id","description") VALUES (1,'Fine Italian Dining'); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO "myapp_italianrestaurant_base" ("id","has_decent_gnocchi") VALUES (1,1); Query OK, 1 row affected (0.00 sec) mysql> UPDATE "myapp_italianrestaurant" SET "description"='Exquisite Italian Dining'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT * FROM myapp_italianrestaurant; +----+-----------+--------------------------+--------------------+ | id | name | description | has_decent_gnocchi | +----+-----------+--------------------------+--------------------+ | 1 | DePalma's | Exquisite Italian Dining | 1 | +----+-----------+--------------------------+--------------------+ 1 rows in set (0.03 sec) }}} '' Unfortunately if you try to update name and description at the same time, this fails. The VIEW scheme would still be useful for SELECT, but I think the base tables will have to be updated directly. At least VIEWs buy you an easier SELECT statement without JOINs. It's possible to handle some of the VIEW updating with INSTEAD OF triggers with Oracle, IBM DB2, and MS SQL (PostgreSQL uses CREATE RULE ... AS ON (INSERT|UPDATE) ... DO INSTEAD), but that kind is not in MySQL yet. I'm not sure how much effort it's worth. -- Andy Dustman '' Another option is to lazily load objects like {{{Restaurant}}} and {{{ItalianRestaurant}}} while we're iterating over {{{Place.objects.all()}}}, but that requires a ''lot'' of database queries. Either way, doing this will be expensive, and api should reflect that. You're much better off ''just'' using {{{Place}}}s fields if you are going to iterate over {{{Place.objects.all()}}} === 3. API === The following API examples assume we have created these objects: {{{ #!python p = Place(name='My House') r = Restaurant(name='Road Kill Cafe', description='Yuck!') ir = ItalianRestaurant(name="Ristorante Miron", description="Italian's best mushrooms restaurant", has_decent_gnocchi=True) }}} For the following examples, assume {{{Place.objects.get(2)}}} returns {{{r}}} and {{{Place.objects.get(3)}}} returns {{{ir}}}. || || python || result || || A. || {{{Place.objects.count()}}} || 3 || || B. || {{{Restaurant.objects.count()}}} || 2 || || C. || {{{ItalianRestaurant.objects.count()}}} || 1 || || D. || {{{Place.objects.get(2).description}}} || 'Yuck!' or {{{AttributeError}}}? || || E. || {{{Restaurant.objects.get(2).description}}} || 'Yuck!' || == Change the current usage of subclassing == {{{ #!python class MyArticle(Article): ...fields... class META: module_name = 'my_articles' remove_fields = ...some fields... }}} would change to: {{{ #!python class MyArticle(meta.Model): ...fields... class META: copy_from = Article remove_fields = ...some fields... }}} == Ramblings on Magic Removal Subclassing from the Pycon Sprint == Consider the following variation on the above restaurant example: {{{ #!python class Place(models.Model): name = models.CharField(maxlength=50) foo = 1 def bar(self): return 2 }}} If Restaurant were to inherit from this, it would not automatically have a 'name' CharField. This is because Django uses a metaclass to modify the default class creation behavior. The ModelBase metaclass creates a new class from scratch, and then selectively pulls items from the Place class as defined above and adds them to this new class, which allows it to handle Field objects specially. For each of the class's attributes, add_to_class() is called. If add_to_class finds a 'contribute_to_class' attribute, ModelBase knows it is dealing with a Field object, and calls contribute_to_class. Otherwise, it just adds it to the new class via setattr(). Thus, by the time the Restaurant class is created, the Place class which it inherits from actually looks more like this: {{{ #!python class Place(models.Model): _meta = ... foo = 1 def bar(self): return 2 }}} Thus, there is simply no 'name' field for it to inherit. So, we need to have ModelBase walk through the parent classes and call contribute_to_class on each of the fields found in _meta.fields. As we walk the inheritance tree, we look for a '_meta' attribute to determine if our current node is a Model. Otherwise, it is either a mixin class or the Model class iteself. We can keep track of the parent hierarchy by creating _meta.parents, and having each ancestor of Model add to it in a recursive fasion, by adding the following to ModelBase.__new__(): {{{ #!python # Build complete list of parents for base in bases: if '_meta' in dir(base): new_class._meta.parents.append(base) new_class._meta.parents.extend(base._meta.parents) }}} We can then add all of the parent fields to the new class like so: {{{ #!python # Add Fields inherited from parents for parent in new_class._meta.parents: for field in parent._meta.fields: field.contribute_to_class(new_class, field.name) }}} That should pretty much be it for the object side of things; what's left is the database side of things (ie, the hard part). == Mixins == here are two scenarios where mixins would be useful (to me). 1. auditing. I would like to have certain models have a created by/time and last-updated by/time on the record, and possibly a XXX_history table which shows the previous version on that entry when it gets changed. a. One way to implement this very useful and needed functionality may be to have a inbuilt 'log/history' model class containing modified_by,modified_time (should be populated *without* user input), optional reason for modification (user comments) ..etc. We can inherit this in our models if necessary and voila! our classes would have an additional myproj_myapp_log table for auditing purposes. b. This log table would contain one row per object per modification with all the attributes of the object before modification + the extra fields from the log class. This log table should additionally be non-editable. Only inserts should be possible. No deletes/modifications. Mysql has an 'archive' table type to facilitate this. Other DBs should have their corresponding types. c. A History view for the object then could be built in the admin page which lists all modifications to an object from its creation till date - we could list only the attributes that have changed and highlight the changes alone. Modification date filters could be provided. This feature will appeal to many. "Django - The framework with Inbuilt data auditing" - The IT Departments will love this. Discussion regarding this and different sample implementations can be found at http://www.tonymarston.net/php-mysql/auditlog.html 2. tagging. I would like to 'mark' a model as being taggable, and let the mixin worry about the rest. This I could do now by overriding the many2many field type, but I think a mixing would be nicer 3. row-level-security. I would like to be able to specifify a the permissions of an item in either a tagging-like fashion or as a function of the values in that item. For instance, if an item has a field "approver", and the value is set to "John", then I want John to have read/write permissions on that item. As soon as the approver field is no longer set to "John", John would lose his privilege unless it was granted by something else in the item or some default. == Other Resources == '''SQLAlchemy''' I assume you guys are aware of it and have perhaps already looked into it, but if not (isn't mentioned here, I didn't follow all previous threads on the list) you might want to look at how SQLAlchemy handles this -- fundamentally a bit different of course as SA is not ActiveRecord-ish (without using one of its "extensions") but perhaps still useful. Maybe start here: http://sqlalchemy.org/docs/adv_datamapping.myt#advdatamapping_inheritance ''-ToddG'' '''PostgreSQL''' Please, take in account that PostgreSQL support that kind of inheritance at database level (see: [http://www.postgresql.org/docs/8.1/interactive/ddl-inherit.html]) would be nice that in case one uses postgresql that inheritance to be used instead of the JOIN's approach. Dunno if other databases support inheritance. Section "5.8.1 Caveats" on the above mentioned doc site shows that PostgreSQL implementation of inheritance might not be mature enough.