Code

Changes between Version 33 and Version 34 of SchemaEvolution


Ignore:
Timestamp:
07/19/07 22:38:45 (7 years ago)
Author:
public@…
Comment:

reformat

Legend:

Unmodified
Added
Removed
Modified
  • SchemaEvolution

    v33 v34  
    1 {{{ 
    2 #!rst 
    3 Django Schema evolution 
    4 ======================= 
    5  
    61Schema migration is one of those "hard" problems that I think will be impossible 
    72to get right for all the cases.  What follows are my thoughts on how to get 
     
    105(please leave comments at the bottom and not inline to the proposal! Also, please add your name or some handle after the comment; everybody using "I" and then not identifying themselves is terribly unclear.) 
    116 
    12 .. contents::      
    13  
    14 Prior art 
    15 --------- 
     7== Prior art == 
    168 
    179A few bits of prior arts I examined. 
     
    2214    * http://divmod.org/trac/wiki/DivmodAxiom/Reference#Upgraders 
    2315     
    24 Use cases  
    25 --------- 
     16== Use cases == 
    2617 
    2718    1. Alice has a blog application written in Django.  A blog entry looks  
    2819       like:: 
    2920         
     21{{{ 
    3022            class Entry(models.Model): 
    3123                title = models.CharField(maxlength=30) 
    3224                body = models.TextField() 
     25}}} 
    3326                 
    3427       After using this to write some blog entries, she realizes that she 
     
    4235    2. Ben has a simple ticket tracker with a ticket class that contains:: 
    4336     
     37{{{ 
    4438            class Ticket(models.Model): 
    4539                reporter    = models.EmailField() 
     
    4741                status      = models.IntegerField(choices=STATUS_CHOICES) 
    4842                description = models.TextField() 
    49                  
     43}}}                 
    5044       After two years of use, Ben wants to add product information to his ticket 
    5145       tracker and has decided he doesn't care about who reported a ticket, so 
    5246       he writes these models:: 
    5347        
     48{{{ 
    5449            class Product(models.Model): 
    5550                ... 
     
    6055                description = models.TextField() 
    6156                product     = models.ForeignKey(Product) 
    62                  
     57}}}                 
    6358       Ben knows quite a bit of SQL; he's also the company DBA. 
    6459     
    6560    3. Carol supports a newsroom with a set of models that look like:: 
    6661     
     62{{{ 
    6763            class Reporter(models.Model): 
    6864                ... 
     
    7571                stories = models.ManyToManyField(Article) 
    7672                ... 
    77                  
     73}}}                 
    7874       As it turns out, her conception of ``Section`` was naive, and on top of 
    7975       that she wants to separate certain types of articles into multiple 
    8076       classes. She rewrites her models to look like:: 
    8177        
     78{{{ 
    8279            class Reporter(models.Model): 
    8380                ...     
     
    9592                 
    9693            class WeddingAnnouncement(models.Model): 
    97                 ... 
     94    
     95             ... 
    9896                 
    9997            class Section(models.Model): 
    10098                included_categories = models.ManyToManyField(Category) 
    10199                ... 
    102          
     100}}}      
    103101       Carol obviously needs to keep all the data in her system. Some articles 
    104102       will need to be moved into ``Obituary`` or ``WeddingAnnouncement``, 
     
    113111       for almost ten years now. 
    114112         
    115 Possible solutions 
    116 ------------------ 
    117  
    118 Write SQL 
    119 ````````` 
     113== Possible solutions == 
     114 
     115=== Write SQL === 
    120116     
    121117This is the current situation: you write a bunch of SQL and bung it into 
     
    132128    to deal with the data migration. 
    133129         
    134 Automatic db introspection 
    135 `````````````````````````` 
     130=== Automatic db introspection === 
    136131 
    137132In this scenario, Django inspects your models and your database, and  
     
    140135like:: 
    141136 
     137{{{ 
    142138    $ ./manage.py sqlupdate | psql mydb 
    143  
     139}}} 
    144140Ramifications: 
    145141    This works for Alice; she does ``django-admin sqlupdate | mysql mydb`` and 
     
    158154    the SQL she writes quickly. 
    159155         
    160 Automatically applied migration code 
    161 ```````````````````````````````````` 
     156=== Automatically applied migration code === 
    162157 
    163158In this scenario, you give your models a version. When it comes time to upgrade, 
     
    190185    since she has lots of custom code she needs to run. 
    191186     
    192 Introspection + migration 
    193 ````````````````````````` 
     187=== Introspection + migration === 
    194188 
    195189This approach is a combination of `Automatic db introspection`_ and 
     
    206200    slightly, but not all that much, really. 
    207201     
    208 Conclusions 
    209 ----------- 
     202== Conclusions == 
    210203 
    211204    * "Just write SQL" sucks and is obviously wrong and needs to be fixed. 
     
    220213      bonus of getting Alice to save her migrations in source control. 
    221214       
    222 Proposal 
    223 -------- 
     215== Proposal == 
    224216 
    225217So. 
     
    229221    * An optional module, ``django.contrib.evolution`` provides a  
    230222      way to store versions in the database:: 
    231        
     223{{{       
    232224            class ModelVersion(models.Model): 
    233225                content_type = models.ForeignKey(ContentType) 
    234226                version      = models.PositiveIntegerField() 
    235  
     227}}} 
    236228      This makes evolution optional and thus doesn't clutter up your tablespace 
    237229      if you don't need it. 
    238230       
    239231    * Models grow a ``version`` attribute:: 
    240      
     232{{{     
    241233            class MyModel(models.Model): 
    242234                ... 
     
    244236                class Meta: 
    245237                    version = 3 
    246                      
     238}}}                     
    247239    * ``django-admin syncdb`` applies evolutions using the following steps: 
    248240     
     
    270262      package(s). Users will then upgrade the version number and run ``syncdb``. 
    271263 
    272 Comments/questions 
    273 ------------------ 
     264== Comments/questions == 
    274265 
    275266Asked by nwp in the chat room: "what are the rules for checking whether you are 
     
    291282 
    292283 
    293 Something Completely Different 
    294 ------------------------------ 
     284== Something Completely Different == 
    295285 
    296286I'd like to suggest a different approach altogether. 
     
    312302.. _RefactoringDatabases: http://www.bookpool.com/sm/0321293533 
    313303 
    314 Why something different 
    315 ----------------------- 
     304== Why something different == 
    316305 
    317306I agree with "Something Completely Different" above. I worked on a Java-based product that used a similar approach to numbered schema versioning (although we worked with serialized objects in the backend instead of a relational DB). We had trouble merging changes with conflicting version numbers. This was not initially a huge problem for us, because we had very little overlap within and between our projects. However, the times that it happened it was a big deal, and as soon as we started real parallel development, it became a critical issue. I think for Django teams this is a killer, because deployments are so quite common and testing tends to be a part of the developers roles. 
     
    319308An example is svnA user creates schema version 5, while concurrently svnB makes version 5. Everything is fine until merge, then either user has an unusable database. This is bad enough, but consider branches! It gets ugly, and can easily break the production database at unexpected times. 
    320309 
    321 Handling concurrent development -- multiple users and branches 
    322 -------------------------------------------------------------- 
     310== Handling concurrent development -- multiple users and branches == 
    323311 
    324312Concurrent development requires communication, especially with a shared data store like the database. The above example should never break the production database -- if two developers commit conflicting code, it's up to them to work out the conflict. Preventing the conflicting code from reaching production is an organizational policy issue, not a web framework issue. 
     
    326314You do raise a good point regarding concurrent development with branches. It's still not possible to get rid of communication -- when it comes time to do a branch merge, discussing database schema will be mandatory. However, the tool should allow for branch merges; why not store a branch name AND a version number for each schema revision? When planning for a branch merge, one can implement a "from branch 1 version 6 to branch 2 version 3" migration. Obviously, migration points will need to be well defined, but supporting branches at least allows for the possibility. 
    327315 
    328 Another way to concurrent development -- migrations as distributed version control systems 
    329 ------------------------------------------------------------------------------------------ 
     316== Another way to concurrent development -- migrations as distributed version control systems == 
    330317 
    331318One of the issues with data base migrations, one raised by the previous section and one which has often bitten users of Rails migrations, is that they don't scale with the number of users: while rails-like migrations are wonderful for single users or very small teams (up to 3 or so members), the point is soon reached when different programmers will create different migrations with the same version number, and hell ensues (manual merges/renames of some of the migrations etc...). 
     
    337324Since the problem has already been solved, this should be handled by learning from the various decentralized VCS (darcs, Git, Mercurial, Bazaar) and introducing one of their patch-merging and diff-tracking strategies into the migration tool. Of course this wouldn't necessarily resolve all the issues of incompatible migrations, but conflicting ones could at least be tracked. 
    338325 
    339 Automatic interactive db introspection that generates migration scripts 
    340 ----------------------------------------------------------------------- 
     326== Automatic interactive db introspection that generates migration scripts == 
     327 
    341328I think that the nicest way for the user that would work for 99% of the cases is an automatic and (if needed) interactive command, that will generate the migration steps it took and store them in some script. Also, it could automatically update the version number in the suggestion on migration scripts above. So for a model that changes from:: 
    342  
     329{{{ 
    343330        class Ticket(models.Model): 
    344331                reporter    = models.EmailField() 
    345332                owner       = models.EmailField() 
    346333                stat      = models.IntegerField(choices=STATUS_CHOICES) 
    347  
     334}}} 
    348335to:: 
    349  
     336{{{ 
    350337        class Ticket(models.Model): 
    351338                reporter    = models.EmailField() 
    352339                status      = models.IntegerField(choices=STATUS_CHOICES) 
    353340                description = models.TextField() 
    354  
     341}}} 
    355342This is one removed field (owner), one added field (description), and one renamed field (stat->status). The renamed is the trickiest one obviously since it looks like one has been deleted and another added. So in this case the user should be given the menu:: 
    356  
     343{{{ 
    357344        The following fields are new: 
    358345                1 status 
     
    373360        Warning: removing the field owner will permanently delete data. Write "delete"  and press enter to continue: delete 
    374361        Database update complete. Modifications written to Ticket-migration-1.sql 
    375  
     362}}} 
    376363If there are only additions, then the entire menu can be totally skipped and just the additions added. If there are no renames the entire thing also becomes much shorter. I think this covers most cases, and any other cases can be handled manually by writing custom SQL migration scripts as above. 
    377364 
     
    382369Mike H (mike@mugwuffin.com) 
    383370 
    384 You can't handle stupidity 
    385 ------------------------------ 
     371=== You can't handle stupidity === 
     372 
    386373To be honest I would not like a system that tried to cater to Carol. She is an idiot and needs to learn more. If the answer to 
    387374something is "you don't have to know what you are doing" then its not a good answer. Django.. to use Django, its not that you don't 
     
    407394-- Anders Hovmöller 
    408395 
    409 }}} 
     396 
    410397 
    411398== Implementation: Automatic DB Introspection ==