Changes between Initial Version and Version 1 of SchemaEvolution


Ignore:
Timestamp:
Feb 27, 2006, 9:21:39 PM (18 years ago)
Author:
Jacob
Comment:

Added writeup/proposal for schema evolution

Legend:

Unmodified
Added
Removed
Modified
  • SchemaEvolution

    v1 v1  
     1{{{
     2#!rst
     3Schema migration is one of those "hard" problems that I think will be impossible
     4to get right for all the cases.  What follows are my thoughts on how to get
     5as close as possible in Django.
     6
     7(please leave comments at the bottom and not inline to the proposal!)
     8
     9Prior art
     10---------
     11
     12A few bits of prior arts I examined.
     13
     14    * http://api.rubyonrails.com/classes/ActiveRecord/Migration.html
     15    * http://sqlobject.org/sqlobject-admin.html#versioning-upgrading
     16    * http://www.varlena.com/varlena/GeneralBits/127.php
     17   
     18Use cases
     19---------
     20
     21    1. Alice has a blog application written in Django.  A blog entry looks
     22       like::
     23       
     24            class Entry(models.Model):
     25                title = models.CharField(maxlength=30)
     26                body = models.TextField()
     27               
     28       After using this to write some blog entries, she realizes that she
     29       really should have had a publish date field on her model, and she
     30       wants to add one.
     31       
     32       Alice doesn't particularly care about old entries; she just wants them to
     33       have an empty publish date. She knows just enough SQL to understand how
     34       relational databases work, but that's about it.
     35       
     36    2. Ben has a simple ticket tracker with a ticket class that contains::
     37   
     38            class Ticket(models.Model):
     39                reporter    = models.EmailField()
     40                owner       = models.EmailField()
     41                status      = models.IntegerField(choices=STATUS_CHOICES)
     42                description = models.TextField()
     43               
     44       After two years of use, Ben wants to product information to his ticket
     45       tracker and has decided he doesn't care about who reported a ticket, so
     46       he writes these models::
     47       
     48            class Product(models.Model):
     49                ...
     50       
     51            class Ticket(models.Model):
     52                owner       = models.EmailField()
     53                status      = models.IntegerField(choices=STATUS_CHOICES)
     54                description = models.TextField()
     55                product     = models.ForeignKey(Product)
     56               
     57       Ben knows quite a bit of SQL; he's also the company DBA.
     58   
     59    3. Carol supports a newsroom with a set of models that look like::
     60   
     61            class Reporter(models.Model):
     62                ...
     63               
     64            class Article(models.Model):
     65                reporter = models.ForeignKey(Reporter)
     66                ...
     67               
     68            class Section(models.Model):
     69                stories = models.ManyToManyField(Article)
     70                ...
     71               
     72       As it turns out, her conception of ``Section`` was naive, and on top of
     73       that she wants to separate certain types of articles into multiple
     74       classes. She rewrites her models to look like::
     75       
     76            class Reporter(models.Model):
     77                ...   
     78           
     79            class Category(models.Model):
     80                ...
     81           
     82            class Article(models.Model):
     83                reporter = models.ForeignKey(Reporter)
     84                categories = models.ManyToManyField(Category)
     85                ...
     86               
     87            class Obituary(models.Model):
     88                ...
     89               
     90            class WeddingAnnouncement(models.Model):
     91                ...
     92               
     93            class Section(models.Model):
     94                included_categories = models.ManyToManyField(Category)
     95                ...
     96       
     97       Carol obviously needs to keep all the data in her system. Some articles
     98       will need to be moved into ``Obituary`` or ``WeddingAnnouncement``,
     99       objects. Additionally, stories will be assigned categories from a mapping
     100       given to her by the web editor since the direct mapping has been removed.
     101       
     102       To add to the fun, Carol has 7 different news sites with separate
     103       databases using this system.  Three of those systems run MySQL, the
     104       other four run PostgreSQL.
     105       
     106       Carol is a very experienced developer; she's been doing this web thing
     107       for almost ten years now.
     108       
     109Possible solutions
     110------------------
     111
     112Write SQL
     113`````````
     114   
     115This is the current situation: you write a bunch of SQL and bung it into
     116your interactive interpreter.
     117
     118Ramifications:
     119    Alice is a simple enough case that she shouldn't have to know
     120    SQL to make this work.
     121   
     122    This situation sounds good to Ben: it's just two lines of SQL.
     123   
     124    Carol would have to apply this SQL against 7 different databases (which is
     125    a waste of time), and she'd still have to write a bunch of Python code
     126    to deal with the data migration.
     127       
     128Automatic db introspection
     129``````````````````````````
     130
     131In this scenario, Django inspects your models and your database, and
     132outputs the SQL for adding/removing fields.  You're supposed to
     133inspect the SQL and then stick it into your database using something
     134like::
     135
     136    $ ./manage.py sqlupdate | psql mydb
     137
     138Ramifications:
     139    This works for Alice; she does ``django-admin sqlupgrade | mysql`` and
     140    is happy.  However, if the SQL that Django put out happened to have a
     141    mistake in it, she might not have noticed, and could have lost some
     142    data.
     143   
     144    Ben loves this situation; it does his two lines of SQL automatically,
     145    and he can poke at what's put out to make sure it's right for him.
     146   
     147    This doesn't really help Carol.  Although the SQL that is put out is a
     148    start, it doesn't really get her very far.  She still has to write
     149    a bunch of data migration by hand, and she also has no way of applying
     150    the SQL she writes quickly.
     151       
     152Automatically applied migration code
     153````````````````````````````````````
     154
     155In this scenario, you give your models a version. When it comes time to upgrade,
     156you write a "upgrade-to-version-3" script and increment your version number.
     157When you run a ``django-admin`` command (probably the newly-added ``syncdb``),
     158Django notices that you've added an upgrade script and runs it automatically.
     159
     160This type of system comes in two (not necessarily exclusive) flavors:
     161   
     162    * SQL only, where your upgrade scripts are in SQL.
     163   
     164    * ActiveRecord Migration style, where you write your migrations
     165      in Python (which gets converted to SQL for you).
     166     
     167Each situation also could support symmetric migration (where you provide
     168both an upgrade and a downgrade script).
     169
     170Ramifications:
     171    Alice doesn't like the SQL flavor at all (she wants to use Python!). The
     172    Pythonic one works very well for her since the syntax mirrors other
     173    parts of Django she's familiar with.  However, she'd prefer something
     174    like the automatic solution since it would save her from having to type
     175    anything at all.
     176   
     177    Ben really likes this one; the SQL is still easy to write, and it gets
     178    into SVN so he can track his upgrades.
     179   
     180    Carol loves this one.  She can write custom SQL/Python that takes care of
     181    all the situation she needs.  The Python option is a better fit for her
     182    since she has lots of custom code she needs to run.
     183   
     184Introspection + migration
     185`````````````````````````
     186
     187This approach is a combination of `Automatic db introspection`_ and
     188`Automatically applied migration code`_ -- one command produces a migration
     189script, you tickle your version number, and the ``syncdb`` runs the migrations.
     190
     191Ramifications:
     192    This works very well for Alice, and she gets the added bonus of having
     193    nice migrations stored for future inspection.
     194   
     195    This is Ben's favorite option.
     196   
     197    Carol's indifferent about this as opposed to the previous one: it helps
     198    slightly, but not all that much, really.
     199   
     200Conclusions
     201-----------
     202
     203    * "Just write SQL" sucks and is obviously wrong and needs to be fixed.
     204   
     205    * DB introspection only goes so far, and doesn't give any way to extend
     206      what comes out beyond "just write SQL".
     207     
     208    * Migration code works pretty well -- which is obvious if you look at how
     209      successful Rails has been with the feature.  The Python flavor is obviously
     210      more powerful, but for some uses SQL works well.
     211     
     212    * Introspection + migration seems to be best-of-breed; it even has the added
     213      bonus of getting Alice to save her migrations in source control.
     214     
     215Proposal
     216--------
     217
     218So.
     219
     220Given all the above, here's how I think schema evolution should work in Django:
     221
     222    * An optional module, ``django.contrib.evolution`` provides a
     223      way to store versions in the database::
     224     
     225            class ModelVersion(models.Model):
     226                content_type = models.ForeignKey(ContentType)
     227                version      = models.PositiveIntegerField()
     228
     229      This makes evolution optional and thus doesn't clutter up your tablespace
     230      if you don't need it.
     231     
     232    * Models grow a ``version`` attribute::
     233   
     234            class MyModel(models.Model):
     235                ...
     236               
     237                class Meta:
     238                    version = 3
     239                   
     240    * ``django-admin syncdb`` applies evolutions using the following steps:
     241   
     242        #. Makes sure evolution is installed before proceeding.
     243       
     244        #. For each content type, checks that the version in the database matches
     245           the version in defined in the model.  For those that don't match...
     246           
     247        #. Django looks for evolutions in ``<app_package>/upgrade/``.  Evolutions
     248           can either be SQL scripts named ``to_version_X.sql`` or Python scripts
     249           named ``to_version_X.py``.
     250           
     251        #. ``django.contrib.evolution`` will contain utilities to abstract evolution
     252           for different db backends.  This exact API is TBD.
     253           
     254        #. For SQL evolutions, if a file named ``to_version_X.out`` exists, ``syncdb``
     255           will expect the output of the evolution to match the text file EXACTLY.  If
     256           any differences happen, the transaction will be rolled back (for that reason
     257           SQL evolution scripts must not contain BEGIN/COMMIT).
     258           
     259           For Python evolutions, there will be some way of doing similar tests.
     260           
     261    * ``django-admin find-migrations`` (or somesuch) will inspect the database
     262      and pump out SQL evolution scripts into the right place in the app
     263      package(s). Users will then upgrade the version number and run ``syncdb``.
     264}}}
     265
     266= Comments =
     267
     268''leave comments here or post 'em to django-dev''
Back to Top