Version 7 (modified by Antti Kaihola, 18 years ago) ( diff )

added link to django-developers thread

Django Schema evolution
=======================

Schema migration is one of those "hard" problems that I think will be impossible
to get right for all the cases.  What follows are my thoughts on how to get
as close as possible in Django.

(please leave comments at the bottom and not inline to the proposal!)

.. contents::     

Prior art
---------

A few bits of prior arts I examined.

    * http://api.rubyonrails.com/classes/ActiveRecord/Migration.html
    * http://sqlobject.org/sqlobject-admin.html#versioning-upgrading
    * http://www.varlena.com/varlena/GeneralBits/127.php
    
Use cases
---------

    1. Alice has a blog application written in Django.  A blog entry looks 
       like::
        
            class Entry(models.Model):
                title = models.CharField(maxlength=30)
                body = models.TextField()
                
       After using this to write some blog entries, she realizes that she
       really should have had a publish date field on her model, and she
       wants to add one.
       
       Alice doesn't particularly care about old entries; she just wants them to
       have an empty publish date. She knows just enough SQL to understand how
       relational databases work, but that's about it.
       
    2. Ben has a simple ticket tracker with a ticket class that contains::
    
            class Ticket(models.Model):
                reporter    = models.EmailField()
                owner       = models.EmailField()
                status      = models.IntegerField(choices=STATUS_CHOICES)
                description = models.TextField()
                
       After two years of use, Ben wants to product information to his ticket
       tracker and has decided he doesn't care about who reported a ticket, so
       he writes these models::
       
            class Product(models.Model):
                ...
       
            class Ticket(models.Model):
                owner       = models.EmailField()
                status      = models.IntegerField(choices=STATUS_CHOICES)
                description = models.TextField()
                product     = models.ForeignKey(Product)
                
       Ben knows quite a bit of SQL; he's also the company DBA.
    
    3. Carol supports a newsroom with a set of models that look like::
    
            class Reporter(models.Model):
                ...
                
            class Article(models.Model):
                reporter = models.ForeignKey(Reporter)
                ...
                
            class Section(models.Model):
                stories = models.ManyToManyField(Article)
                ...
                
       As it turns out, her conception of ``Section`` was naive, and on top of
       that she wants to separate certain types of articles into multiple
       classes. She rewrites her models to look like::
       
            class Reporter(models.Model):
                ...    
            
            class Category(models.Model):
                ...
            
            class Article(models.Model):
                reporter = models.ForeignKey(Reporter)
                categories = models.ManyToManyField(Category)
                ...
                
            class Obituary(models.Model):
                ...
                
            class WeddingAnnouncement(models.Model):
                ...
                
            class Section(models.Model):
                included_categories = models.ManyToManyField(Category)
                ...
        
       Carol obviously needs to keep all the data in her system. Some articles
       will need to be moved into ``Obituary`` or ``WeddingAnnouncement``,
       objects. Additionally, stories will be assigned categories from a mapping
       given to her by the web editor since the direct mapping has been removed.
       
       To add to the fun, Carol has 7 different news sites with separate
       databases using this system.  Three of those systems run MySQL, the
       other four run PostgreSQL.
       
       Carol is a very experienced developer; she's been doing this web thing
       for almost ten years now.
        
Possible solutions
------------------

Write SQL
`````````
    
This is the current situation: you write a bunch of SQL and bung it into
your interactive interpreter.

Ramifications:
    Alice is a simple enough case that she shouldn't have to know
    SQL to make this work.
    
    This situation sounds good to Ben: it's just two lines of SQL.
    
    Carol would have to apply this SQL against 7 different databases (which is 
    a waste of time), and she'd still have to write a bunch of Python code
    to deal with the data migration.
        
Automatic db introspection
``````````````````````````

In this scenario, Django inspects your models and your database, and 
outputs the SQL for adding/removing fields.  You're supposed to 
inspect the SQL and then stick it into your database using something
like::

    $ ./manage.py sqlupdate | psql mydb

Ramifications:
    This works for Alice; she does ``django-admin sqlupgrade | mysql`` and
    is happy.  However, if the SQL that Django put out happened to have a
    mistake in it, she might not have noticed, and could have lost some
    data.
    
    Ben loves this situation; it does his two lines of SQL automatically,
    and he can poke at what's put out to make sure it's right for him.
    
    This doesn't really help Carol.  Although the SQL that is put out is a
    start, it doesn't really get her very far.  She still has to write
    a bunch of data migration by hand, and she also has no way of applying
    the SQL she writes quickly.
        
Automatically applied migration code
````````````````````````````````````

In this scenario, you give your models a version. When it comes time to upgrade,
you write a "upgrade-to-version-3" script and increment your version number.
When you run a ``django-admin`` command (probably the newly-added ``syncdb``),
Django notices that you've added an upgrade script and runs it automatically.

This type of system comes in two (not necessarily exclusive) flavors:
    
    * SQL only, where your upgrade scripts are in SQL.
    
    * ActiveRecord Migration style, where you write your migrations
      in Python (which gets converted to SQL for you).
      
Each situation also could support symmetric migration (where you provide
both an upgrade and a downgrade script).

Ramifications:
    Alice doesn't like the SQL flavor at all (she wants to use Python!). The
    Pythonic one works very well for her since the syntax mirrors other
    parts of Django she's familiar with.  However, she'd prefer something
    like the automatic solution since it would save her from having to type
    anything at all.
    
    Ben really likes this one; the SQL is still easy to write, and it gets
    into SVN so he can track his upgrades.
    
    Carol loves this one.  She can write custom SQL/Python that takes care of 
    all the situation she needs.  The Python option is a better fit for her
    since she has lots of custom code she needs to run.
    
Introspection + migration
`````````````````````````

This approach is a combination of `Automatic db introspection`_ and
`Automatically applied migration code`_ -- one command produces a migration
script, you tickle your version number, and the ``syncdb`` runs the migrations.

Ramifications:
    This works very well for Alice, and she gets the added bonus of having 
    nice migrations stored for future inspection.
    
    This is Ben's favorite option.
    
    Carol's indifferent about this as opposed to the previous one: it helps 
    slightly, but not all that much, really.
    
Conclusions
-----------

    * "Just write SQL" sucks and is obviously wrong and needs to be fixed.
    
    * DB introspection only goes so far, and doesn't give any way to extend
      what comes out beyond "just write SQL".
      
    * Migration code works pretty well -- which is obvious if you look at how
      successful Rails has been with the feature.  The Python flavor is obviously
      more powerful, but for some uses SQL works well.
      
    * Introspection + migration seems to be best-of-breed; it even has the added
      bonus of getting Alice to save her migrations in source control.
      
Proposal
--------

So.

Given all the above, here's how I think schema evolution should work in Django:

    * An optional module, ``django.contrib.evolution`` provides a 
      way to store versions in the database::
      
            class ModelVersion(models.Model):
                content_type = models.ForeignKey(ContentType)
                version      = models.PositiveIntegerField()

      This makes evolution optional and thus doesn't clutter up your tablespace
      if you don't need it.
      
    * Models grow a ``version`` attribute::
    
            class MyModel(models.Model):
                ...
                
                class Meta:
                    version = 3
                    
    * ``django-admin syncdb`` applies evolutions using the following steps:
    
        #. Makes sure evolution is installed before proceeding.
        
        #. For each content type, checks that the version in the database matches
           the version in defined in the model.  For those that don't match...
           
        #. Django looks for evolutions in ``<app_package>/upgrade/``.  Evolutions
           can either be SQL scripts named ``to_version_X.sql`` or Python scripts
           named ``to_version_X.py``.
           
        #. ``django.contrib.evolution`` will contain utilities to abstract evolution
           for different db backends.  This exact API is TBD.
           
        #. For SQL evolutions, if a file named ``to_version_X.out`` exists, ``syncdb``
           will expect the output of the evolution to match the text file EXACTLY.  If
           any differences happen, the transaction will be rolled back (for that reason
           SQL evolution scripts must not contain BEGIN/COMMIT).
           
           For Python evolutions, there will be some way of doing similar tests.
           
    * ``django-admin find-migrations`` (or somesuch) will inspect the database
      and pump out SQL evolution scripts into the right place in the app
      package(s). Users will then upgrade the version number and run ``syncdb``.

Comments/questions
------------------

Asked by nwp in the chat room: "what are the rules for checking whether you are
in a position to apply to_version_X?"

        Given a model that defines ``version = 6`` and a version of 3 stored in the 
        database, ``to_version_4``, ``to_version_5``, and ``to_version_6`` will be
        applied -- but only if they exist (missing files are OK).

If versions are stored per-model, what happens if evolutions affect multiple models, or create or delete them? For example, in Carol's situation above, how would the individual migration scripts cooperate to move the data around in the required way? I think versions make more sense on a per-app basis; if that distinction exists in magic-removal.

More discussion in a `django-developers thread`_ (should main points be copied here?).

.. _django-developers thread: http://groups.google.com/group/django-developers/browse_frm/thread/ec578ea4b239e844

Note: See TracWiki for help on using the wiki.
Back to Top