Changes between Version 33 and Version 34 of SchemaEvolution


Ignore:
Timestamp:
Jul 20, 2007, 12:38:45 AM (17 years ago)
Author:
public@…
Comment:

reformat

Legend:

Unmodified
Added
Removed
Modified
  • SchemaEvolution

    v33 v34  
    1 {{{
    2 #!rst
    3 Django Schema evolution
    4 =======================
    5 
    61Schema migration is one of those "hard" problems that I think will be impossible
    72to get right for all the cases.  What follows are my thoughts on how to get
     
    105(please leave comments at the bottom and not inline to the proposal! Also, please add your name or some handle after the comment; everybody using "I" and then not identifying themselves is terribly unclear.)
    116
    12 .. contents::     
    13 
    14 Prior art
    15 ---------
     7== Prior art ==
    168
    179A few bits of prior arts I examined.
     
    2214    * http://divmod.org/trac/wiki/DivmodAxiom/Reference#Upgraders
    2315   
    24 Use cases
    25 ---------
     16== Use cases ==
    2617
    2718    1. Alice has a blog application written in Django.  A blog entry looks
    2819       like::
    2920       
     21{{{
    3022            class Entry(models.Model):
    3123                title = models.CharField(maxlength=30)
    3224                body = models.TextField()
     25}}}
    3326               
    3427       After using this to write some blog entries, she realizes that she
     
    4235    2. Ben has a simple ticket tracker with a ticket class that contains::
    4336   
     37{{{
    4438            class Ticket(models.Model):
    4539                reporter    = models.EmailField()
     
    4741                status      = models.IntegerField(choices=STATUS_CHOICES)
    4842                description = models.TextField()
    49                
     43}}}               
    5044       After two years of use, Ben wants to add product information to his ticket
    5145       tracker and has decided he doesn't care about who reported a ticket, so
    5246       he writes these models::
    5347       
     48{{{
    5449            class Product(models.Model):
    5550                ...
     
    6055                description = models.TextField()
    6156                product     = models.ForeignKey(Product)
    62                
     57}}}               
    6358       Ben knows quite a bit of SQL; he's also the company DBA.
    6459   
    6560    3. Carol supports a newsroom with a set of models that look like::
    6661   
     62{{{
    6763            class Reporter(models.Model):
    6864                ...
     
    7571                stories = models.ManyToManyField(Article)
    7672                ...
    77                
     73}}}               
    7874       As it turns out, her conception of ``Section`` was naive, and on top of
    7975       that she wants to separate certain types of articles into multiple
    8076       classes. She rewrites her models to look like::
    8177       
     78{{{
    8279            class Reporter(models.Model):
    8380                ...   
     
    9592               
    9693            class WeddingAnnouncement(models.Model):
    97                 ...
     94   
     95             ...
    9896               
    9997            class Section(models.Model):
    10098                included_categories = models.ManyToManyField(Category)
    10199                ...
    102         
     100}}}     
    103101       Carol obviously needs to keep all the data in her system. Some articles
    104102       will need to be moved into ``Obituary`` or ``WeddingAnnouncement``,
     
    113111       for almost ten years now.
    114112       
    115 Possible solutions
    116 ------------------
    117 
    118 Write SQL
    119 `````````
     113== Possible solutions ==
     114
     115=== Write SQL ===
    120116   
    121117This is the current situation: you write a bunch of SQL and bung it into
     
    132128    to deal with the data migration.
    133129       
    134 Automatic db introspection
    135 ``````````````````````````
     130=== Automatic db introspection ===
    136131
    137132In this scenario, Django inspects your models and your database, and
     
    140135like::
    141136
     137{{{
    142138    $ ./manage.py sqlupdate | psql mydb
    143 
     139}}}
    144140Ramifications:
    145141    This works for Alice; she does ``django-admin sqlupdate | mysql mydb`` and
     
    158154    the SQL she writes quickly.
    159155       
    160 Automatically applied migration code
    161 ````````````````````````````````````
     156=== Automatically applied migration code ===
    162157
    163158In this scenario, you give your models a version. When it comes time to upgrade,
     
    190185    since she has lots of custom code she needs to run.
    191186   
    192 Introspection + migration
    193 `````````````````````````
     187=== Introspection + migration ===
    194188
    195189This approach is a combination of `Automatic db introspection`_ and
     
    206200    slightly, but not all that much, really.
    207201   
    208 Conclusions
    209 -----------
     202== Conclusions ==
    210203
    211204    * "Just write SQL" sucks and is obviously wrong and needs to be fixed.
     
    220213      bonus of getting Alice to save her migrations in source control.
    221214     
    222 Proposal
    223 --------
     215== Proposal ==
    224216
    225217So.
     
    229221    * An optional module, ``django.contrib.evolution`` provides a
    230222      way to store versions in the database::
    231      
     223{{{     
    232224            class ModelVersion(models.Model):
    233225                content_type = models.ForeignKey(ContentType)
    234226                version      = models.PositiveIntegerField()
    235 
     227}}}
    236228      This makes evolution optional and thus doesn't clutter up your tablespace
    237229      if you don't need it.
    238230     
    239231    * Models grow a ``version`` attribute::
    240    
     232{{{   
    241233            class MyModel(models.Model):
    242234                ...
     
    244236                class Meta:
    245237                    version = 3
    246                    
     238}}}                   
    247239    * ``django-admin syncdb`` applies evolutions using the following steps:
    248240   
     
    270262      package(s). Users will then upgrade the version number and run ``syncdb``.
    271263
    272 Comments/questions
    273 ------------------
     264== Comments/questions ==
    274265
    275266Asked by nwp in the chat room: "what are the rules for checking whether you are
     
    291282
    292283
    293 Something Completely Different
    294 ------------------------------
     284== Something Completely Different ==
    295285
    296286I'd like to suggest a different approach altogether.
     
    312302.. _RefactoringDatabases: http://www.bookpool.com/sm/0321293533
    313303
    314 Why something different
    315 -----------------------
     304== Why something different ==
    316305
    317306I agree with "Something Completely Different" above. I worked on a Java-based product that used a similar approach to numbered schema versioning (although we worked with serialized objects in the backend instead of a relational DB). We had trouble merging changes with conflicting version numbers. This was not initially a huge problem for us, because we had very little overlap within and between our projects. However, the times that it happened it was a big deal, and as soon as we started real parallel development, it became a critical issue. I think for Django teams this is a killer, because deployments are so quite common and testing tends to be a part of the developers roles.
     
    319308An example is svnA user creates schema version 5, while concurrently svnB makes version 5. Everything is fine until merge, then either user has an unusable database. This is bad enough, but consider branches! It gets ugly, and can easily break the production database at unexpected times.
    320309
    321 Handling concurrent development -- multiple users and branches
    322 --------------------------------------------------------------
     310== Handling concurrent development -- multiple users and branches ==
    323311
    324312Concurrent development requires communication, especially with a shared data store like the database. The above example should never break the production database -- if two developers commit conflicting code, it's up to them to work out the conflict. Preventing the conflicting code from reaching production is an organizational policy issue, not a web framework issue.
     
    326314You do raise a good point regarding concurrent development with branches. It's still not possible to get rid of communication -- when it comes time to do a branch merge, discussing database schema will be mandatory. However, the tool should allow for branch merges; why not store a branch name AND a version number for each schema revision? When planning for a branch merge, one can implement a "from branch 1 version 6 to branch 2 version 3" migration. Obviously, migration points will need to be well defined, but supporting branches at least allows for the possibility.
    327315
    328 Another way to concurrent development -- migrations as distributed version control systems
    329 ------------------------------------------------------------------------------------------
     316== Another way to concurrent development -- migrations as distributed version control systems ==
    330317
    331318One of the issues with data base migrations, one raised by the previous section and one which has often bitten users of Rails migrations, is that they don't scale with the number of users: while rails-like migrations are wonderful for single users or very small teams (up to 3 or so members), the point is soon reached when different programmers will create different migrations with the same version number, and hell ensues (manual merges/renames of some of the migrations etc...).
     
    337324Since the problem has already been solved, this should be handled by learning from the various decentralized VCS (darcs, Git, Mercurial, Bazaar) and introducing one of their patch-merging and diff-tracking strategies into the migration tool. Of course this wouldn't necessarily resolve all the issues of incompatible migrations, but conflicting ones could at least be tracked.
    338325
    339 Automatic interactive db introspection that generates migration scripts
    340 -----------------------------------------------------------------------
     326== Automatic interactive db introspection that generates migration scripts ==
     327
    341328I think that the nicest way for the user that would work for 99% of the cases is an automatic and (if needed) interactive command, that will generate the migration steps it took and store them in some script. Also, it could automatically update the version number in the suggestion on migration scripts above. So for a model that changes from::
    342 
     329{{{
    343330        class Ticket(models.Model):
    344331                reporter    = models.EmailField()
    345332                owner       = models.EmailField()
    346333                stat      = models.IntegerField(choices=STATUS_CHOICES)
    347 
     334}}}
    348335to::
    349 
     336{{{
    350337        class Ticket(models.Model):
    351338                reporter    = models.EmailField()
    352339                status      = models.IntegerField(choices=STATUS_CHOICES)
    353340                description = models.TextField()
    354 
     341}}}
    355342This is one removed field (owner), one added field (description), and one renamed field (stat->status). The renamed is the trickiest one obviously since it looks like one has been deleted and another added. So in this case the user should be given the menu::
    356 
     343{{{
    357344        The following fields are new:
    358345                1 status
     
    373360        Warning: removing the field owner will permanently delete data. Write "delete"  and press enter to continue: delete
    374361        Database update complete. Modifications written to Ticket-migration-1.sql
    375 
     362}}}
    376363If there are only additions, then the entire menu can be totally skipped and just the additions added. If there are no renames the entire thing also becomes much shorter. I think this covers most cases, and any other cases can be handled manually by writing custom SQL migration scripts as above.
    377364
     
    382369Mike H (mike@mugwuffin.com)
    383370
    384 You can't handle stupidity
    385 ------------------------------
     371=== You can't handle stupidity ===
     372
    386373To be honest I would not like a system that tried to cater to Carol. She is an idiot and needs to learn more. If the answer to
    387374something is "you don't have to know what you are doing" then its not a good answer. Django.. to use Django, its not that you don't
     
    407394-- Anders Hovmöller
    408395
    409 }}}
     396
    410397
    411398== Implementation: Automatic DB Introspection ==
Back to Top