Changes between Initial Version and Version 1 of SchemaEvolutionProposal


Ignore:
Timestamp:
Apr 24, 2006, 1:34:00 PM (20 years ago)
Author:
brantley
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SchemaEvolutionProposal

    v1 v1  
     1So I like the general direction of introspection + migration, so I thought I'd go into some detail about the api itself, and suggest a few changes.  Finally, if given possitive feedback, I would build this, if no one else is active on it.
     2
     3=== Evolution File ===
     4The most basic change I'd make is to put, by default, all the migration code into one file right next to the models.py, called something like "models.evolution.py".  Surely we could also split the file up, but I think it'd be a lot simpler and nicer if we didn't have a bunch of files being created.  The api would consist of one function, with keyword options.  The one function would be {{{evolve()}}}, it 'evolves' the database to match the current app models (so evolution is done on the app level, as opposed to the model level).  {{{evolve()}}} would take 1 required keyword argument: {{{version}}}, indicating what version of the model this evolution is for.  So if the model is at, say version 5, and {{{evolve(version=4)}}} is hit, that evolution would be ignored.  {{{evolve()}}} would also take keyword arguments {{{create}}}, {{{drop}}}, and {{{rename}}}, which creates, drops, and renames tables/models.  {{{create}}} and {{{drop}}} will take a list of strings that are names of the models to change, and {{{rename}}} will take a mapping of string to string, old-model to new-model.  It will also take {{{pre}}} and {{{post}}}, which may be assigned functions that will execute commands before and after the changes are made to the database.
     5
     6=== Process ===
     7The user would perform the evolution with a two-step process.  First they would run {{{./manage.py evolve [app]}}}, which would inspect each app, or just the specified one, for changes, and then append those changes to the "models.evolution.py".  It would then explain, in a natural language (e.g. english) all the changes that are to be made.  Perhaps at this stage it should also suggest backing up the database and/or offer to do it for the user.  Finally the user would run ./manage syncdb as usual.  This would now have the extra functionality of running all the "models.evolution.py" files in the apps.  If there is no change, each {{{evolve()}}} would return silently as their {{{version}}} keyword would be lower than the current version.  Otherwise, each {{{evolve()}}} that has the higher version would run, making changes to the database and incrementing the current version.
     8
     9=== Model Shadow ===
     10Each model file will also have a sort of shadow that is created each time the syncdb updates the database.  This will store the current models and version.  This way, changes can be detected if the user changes the models.py file.  This could be implimented as either a static file, say called ".models.shadow.py" (easier), or in the database (tricky).
     11
     12=== Pre and Post ===
     13Here is where custom work can be done to make any sort of change the user wishes.  The functions given to {{{evolve()}}} as keywords {{{pre}}} and {{{post}}} must take two arguments: {{{models}}}, and {{{cursor}}}.  {{{cursor}}} is simply the database cursor so that custom SQL commands can be run.  {{{models}}} is a module holding the models in the app.  But here's the rub: for {{{post}}} the models come from the current models.py file, but for {{{pre}}} it comes from the model shadow, as that matches the state of the database before changes are made.
     14
     15=== Example ===
     16Imagine we are creating a very simple blog application.  We, naively decide to make one model called 'Blog', which is a blog entry.  We want to give it title, body, and pub_date fields.  So we create a model that looks like so:
     17{{{
     18#!python
     19class Blog(models.Model):
     20    title = models.CharField(maxlength=60)
     21    body = models.TextField()
     22    pub_date = models.DateTimeField(auto_add_now=True)
     23}}}
     24
     25Later on, we realize that what we are calling a "Blog" object is really a blog "Entry" object.  We also decide that we want to give it a "tag" field, so that we can apply tags to each entry.  So we make the changes in the model, and add "tag = models.CharField(maxlength=20)".  ./manage.py evolve realizes that the fields of what it sees as a newly created "Entry" model almost match the fields of the newly deleted "Blog" model, so it puts two and two together and realizes that it should rename "Blog" to "Entry", and add the tag field to it.  It creates this script:
     26
     27{{{
     28#!python
     29#### VERSION 2 #####
     30evolve(  version = 2,
     31         rename = {'Blog': 'Entry'}  )
     32}}}
     33
     34Since this is the first update of our model, it will update to version 2.  ./manage.py syncdb now creates a shadow file (or updates entries in a database shadow), and sets the version to 2.
     35
     36Much later, after we've already created many blog entries, we realize that our tag field is not sufficient.  It only allows us to add one tag!  What we really need is a Tag model, and a ManyToManyField with our Entry model.  So we update our models.py file:
     37
     38{{{
     39#!python
     40class Tag(models.Model):
     41    name = models.CharField(maxlength=32)
     42
     43class Entry(models.Model):
     44    tags = models.ManyToManyField(Tag)
     45    title = models.CharField(maxlength=60)
     46    body = models.TextField()
     47    pub_date = models.DateTimeField(auto_add_now=True)
     48}}}
     49
     50Then we run {{{./manage.py evolve}}} and it explains what it wants to do:
     51{{{
     52Applying this evolution will:
     53   Create the model Tag.
     54   Drop the Entry field "tag".
     55   Add the ManyToMany field "tags" to Entry refering to Tag.
     56}}}
     57
     58It also appends to the evolution file making it:
     59{{{
     60#!python
     61#### VERSION 2 #####
     62evolve(  version = 2,
     63         rename = {'Blog': 'Entry'}  )
     64
     65#### VERSION 3 #####
     66evolve(  version = 3,
     67         create = ['Tag']
     68}}}
     69
     70On the next sync, it will then create a table 'Tag', and it will automatically make the changes to Blog, as it is aware of what changes need to be made.  Well, that sounds good on the surface, but that means that we will lose all of our already applied tags.  We're going to have to move some data around to make a Tag object for each tag already on an entry.  So before we syncdb we go in and update the {{{models.evolution.py}}} file to look like this:
     71
     72{{{
     73#!python
     74#### VERSION 2 #####
     75evolve(  version = 2,
     76         rename = {'Blog': 'Entry'}  )
     77
     78#### VERSION 3 #####
     79tag_set = []
     80entry_to_tag = {}
     81
     82def presync(models, cursor):
     83    global tag_set, entry_to_tag
     84    # Get every unique tag in our entries
     85    tag_set = set([entry.tag for entry in models.Entry.objects.iterator()])
     86    # Map each entry to a tag
     87    entry_to_tag = dict([(entry.id, entry.tag) for entry in models.Entry.objects.iterator()])
     88   
     89def postsync(models, cursor):
     90    # Create the Tag objects
     91    for tag in tag_set:
     92        models.Tag(name=tag).save()
     93    # Add a tag for each entry from the earlier mapping.
     94    for entry in models.Entry.objects.all():
     95        entry.tags.add(models.Tag.get(name=entry_to_tag[entry.id])
     96        entry.save()
     97
     98evolve(  version = 3,
     99         create = ['Tag'],
     100         pre = presync,
     101         post = postsync  )
     102}}}
     103
     104First we create some global variables to hold our data through the database update process.  Then we define a {{{presync()}}} function that will run ''before'' the database is updated and uses the models from our shadow file.  Finally we create a {{{postsync()}}} function that performs the updates after the changes have been made to the database.  Now, that's what we want to do, so we run ./manage.py syncdb, and all of our changes have been made, w00t!
     105
     106Then some guy comes in and start messing with our app.  He REALLY wants comments added, so he goes in and makes some changes, the bastard, and doesn't consult you.  He adds a model:
     107
     108{{{
     109#!python
     110
     111class Comment(models.Model):
     112    body = models.CharField(maxlength=1028)
     113    maker = models.CharField(maxlength=50)
     114    address = models.CharField(maxlength=30)
     115}}}
     116
     117He then deletes the evolution file (wtf!?), runs ./manage.py evolve and then evolves it.  The evoltution file now looks like:
     118
     119{{{
     120#!python
     121#### VERSION 4 #####
     122evolve(  version = 4,
     123            create = ['Comment']   )
     124}}}
     125
     126First of all, on the dbms we're running (mysql), a maxlength of 1028 is well beyond the 256 that will convert to a TextField anyway, so should be changed, second 'maker' and 'address' are terrible names which you find out were supposed to be 'author' and 'email'.  And that email field needs to be enlarged (maxlength>30) and made able to be blank, because some people won't leave their email.  Greatfully, this project isn't distributed, so deleting the evolution file turns out to have no effect.  So you make the changes:
     127
     128{{{
     129#!python
     130
     131class Comment(models.Model):
     132    body = models.TextField()
     133    author = models.CharField(maxlength=50)
     134    email = models.CharField(maxlength=128, blank=True)
     135}}}
     136
     137./manage.py evolve then runs and says:
     138{{{
     139Applying this evolution will:
     140   Rename the Comment field "maker" to "author".
     141   Change the Comment field "body" to a TextField().
     142   Add the Comment field "email" as a CharField(maxlength=128, blank=True).
     143   Drop the Comment field "address".
     144}}}
     145
     146Well that's almost what you want.  It was able to guess that "maker" needed to be changed to "author", because the two fields were identical CharField(maxlength=50), but "address" and "email" are not, so it tells you that it will drop "address", and add "email".  Let's change things around a bit.  You decide to delete the evolution file to erase the memory of his changes, and create a new one:
     147
     148{{{
     149#!python
     150#### VERSION 5 #####       
     151evolve(  version = 5,
     152         rename = {'Comment.maker': 'Comment.author', 'Comment.address': 'Story.email'}   )
     153}}}
     154
     155This is all that is needed to tell syncdb to rename the fields instead of dropping them.  Their field types are automatically changed to reflect the new model.
     156
     157=== Conclusion ===
     158A few things:
     159  * There is no reason that you couldn't also break migration code up into smaller files, but I think this way is cleaner, and easier to deal with as a default.
     160  * To add funcitonality for rolling back the changes, one would have to add "pre_rollback" and "post_rollback" keywords to the {{{evolve()}}} function.  And we'd have to keep a record of all past models much like the shadow model.  You could easily then, assume that on rollback: create -> drop, drop -> create, and rename is reversed.
     161  * If one could keep the shadow models in the database, that would be best, but I'm not sure the best way of doing that.
     162  * Are there any other operations other than "create", "drop", and "rename", that might need to be expressed and aren't obvious (such as changing a field options like "maxlength"?
     163
     164=== Comments ===
Back to Top