| 1 | {{{ |
| 2 | #!rst |
| 3 | Schema migration is one of those "hard" problems that I think will be impossible |
| 4 | to get right for all the cases. What follows are my thoughts on how to get |
| 5 | as close as possible in Django. |
| 6 | |
| 7 | (please leave comments at the bottom and not inline to the proposal!) |
| 8 | |
| 9 | Prior art |
| 10 | --------- |
| 11 | |
| 12 | A few bits of prior arts I examined. |
| 13 | |
| 14 | * http://api.rubyonrails.com/classes/ActiveRecord/Migration.html |
| 15 | * http://sqlobject.org/sqlobject-admin.html#versioning-upgrading |
| 16 | * http://www.varlena.com/varlena/GeneralBits/127.php |
| 17 | |
| 18 | Use cases |
| 19 | --------- |
| 20 | |
| 21 | 1. Alice has a blog application written in Django. A blog entry looks |
| 22 | like:: |
| 23 | |
| 24 | class Entry(models.Model): |
| 25 | title = models.CharField(maxlength=30) |
| 26 | body = models.TextField() |
| 27 | |
| 28 | After using this to write some blog entries, she realizes that she |
| 29 | really should have had a publish date field on her model, and she |
| 30 | wants to add one. |
| 31 | |
| 32 | Alice doesn't particularly care about old entries; she just wants them to |
| 33 | have an empty publish date. She knows just enough SQL to understand how |
| 34 | relational databases work, but that's about it. |
| 35 | |
| 36 | 2. Ben has a simple ticket tracker with a ticket class that contains:: |
| 37 | |
| 38 | class Ticket(models.Model): |
| 39 | reporter = models.EmailField() |
| 40 | owner = models.EmailField() |
| 41 | status = models.IntegerField(choices=STATUS_CHOICES) |
| 42 | description = models.TextField() |
| 43 | |
| 44 | After two years of use, Ben wants to product information to his ticket |
| 45 | tracker and has decided he doesn't care about who reported a ticket, so |
| 46 | he writes these models:: |
| 47 | |
| 48 | class Product(models.Model): |
| 49 | ... |
| 50 | |
| 51 | class Ticket(models.Model): |
| 52 | owner = models.EmailField() |
| 53 | status = models.IntegerField(choices=STATUS_CHOICES) |
| 54 | description = models.TextField() |
| 55 | product = models.ForeignKey(Product) |
| 56 | |
| 57 | Ben knows quite a bit of SQL; he's also the company DBA. |
| 58 | |
| 59 | 3. Carol supports a newsroom with a set of models that look like:: |
| 60 | |
| 61 | class Reporter(models.Model): |
| 62 | ... |
| 63 | |
| 64 | class Article(models.Model): |
| 65 | reporter = models.ForeignKey(Reporter) |
| 66 | ... |
| 67 | |
| 68 | class Section(models.Model): |
| 69 | stories = models.ManyToManyField(Article) |
| 70 | ... |
| 71 | |
| 72 | As it turns out, her conception of ``Section`` was naive, and on top of |
| 73 | that she wants to separate certain types of articles into multiple |
| 74 | classes. She rewrites her models to look like:: |
| 75 | |
| 76 | class Reporter(models.Model): |
| 77 | ... |
| 78 | |
| 79 | class Category(models.Model): |
| 80 | ... |
| 81 | |
| 82 | class Article(models.Model): |
| 83 | reporter = models.ForeignKey(Reporter) |
| 84 | categories = models.ManyToManyField(Category) |
| 85 | ... |
| 86 | |
| 87 | class Obituary(models.Model): |
| 88 | ... |
| 89 | |
| 90 | class WeddingAnnouncement(models.Model): |
| 91 | ... |
| 92 | |
| 93 | class Section(models.Model): |
| 94 | included_categories = models.ManyToManyField(Category) |
| 95 | ... |
| 96 | |
| 97 | Carol obviously needs to keep all the data in her system. Some articles |
| 98 | will need to be moved into ``Obituary`` or ``WeddingAnnouncement``, |
| 99 | objects. Additionally, stories will be assigned categories from a mapping |
| 100 | given to her by the web editor since the direct mapping has been removed. |
| 101 | |
| 102 | To add to the fun, Carol has 7 different news sites with separate |
| 103 | databases using this system. Three of those systems run MySQL, the |
| 104 | other four run PostgreSQL. |
| 105 | |
| 106 | Carol is a very experienced developer; she's been doing this web thing |
| 107 | for almost ten years now. |
| 108 | |
| 109 | Possible solutions |
| 110 | ------------------ |
| 111 | |
| 112 | Write SQL |
| 113 | ````````` |
| 114 | |
| 115 | This is the current situation: you write a bunch of SQL and bung it into |
| 116 | your interactive interpreter. |
| 117 | |
| 118 | Ramifications: |
| 119 | Alice is a simple enough case that she shouldn't have to know |
| 120 | SQL to make this work. |
| 121 | |
| 122 | This situation sounds good to Ben: it's just two lines of SQL. |
| 123 | |
| 124 | Carol would have to apply this SQL against 7 different databases (which is |
| 125 | a waste of time), and she'd still have to write a bunch of Python code |
| 126 | to deal with the data migration. |
| 127 | |
| 128 | Automatic db introspection |
| 129 | `````````````````````````` |
| 130 | |
| 131 | In this scenario, Django inspects your models and your database, and |
| 132 | outputs the SQL for adding/removing fields. You're supposed to |
| 133 | inspect the SQL and then stick it into your database using something |
| 134 | like:: |
| 135 | |
| 136 | $ ./manage.py sqlupdate | psql mydb |
| 137 | |
| 138 | Ramifications: |
| 139 | This works for Alice; she does ``django-admin sqlupgrade | mysql`` and |
| 140 | is happy. However, if the SQL that Django put out happened to have a |
| 141 | mistake in it, she might not have noticed, and could have lost some |
| 142 | data. |
| 143 | |
| 144 | Ben loves this situation; it does his two lines of SQL automatically, |
| 145 | and he can poke at what's put out to make sure it's right for him. |
| 146 | |
| 147 | This doesn't really help Carol. Although the SQL that is put out is a |
| 148 | start, it doesn't really get her very far. She still has to write |
| 149 | a bunch of data migration by hand, and she also has no way of applying |
| 150 | the SQL she writes quickly. |
| 151 | |
| 152 | Automatically applied migration code |
| 153 | ```````````````````````````````````` |
| 154 | |
| 155 | In this scenario, you give your models a version. When it comes time to upgrade, |
| 156 | you write a "upgrade-to-version-3" script and increment your version number. |
| 157 | When you run a ``django-admin`` command (probably the newly-added ``syncdb``), |
| 158 | Django notices that you've added an upgrade script and runs it automatically. |
| 159 | |
| 160 | This type of system comes in two (not necessarily exclusive) flavors: |
| 161 | |
| 162 | * SQL only, where your upgrade scripts are in SQL. |
| 163 | |
| 164 | * ActiveRecord Migration style, where you write your migrations |
| 165 | in Python (which gets converted to SQL for you). |
| 166 | |
| 167 | Each situation also could support symmetric migration (where you provide |
| 168 | both an upgrade and a downgrade script). |
| 169 | |
| 170 | Ramifications: |
| 171 | Alice doesn't like the SQL flavor at all (she wants to use Python!). The |
| 172 | Pythonic one works very well for her since the syntax mirrors other |
| 173 | parts of Django she's familiar with. However, she'd prefer something |
| 174 | like the automatic solution since it would save her from having to type |
| 175 | anything at all. |
| 176 | |
| 177 | Ben really likes this one; the SQL is still easy to write, and it gets |
| 178 | into SVN so he can track his upgrades. |
| 179 | |
| 180 | Carol loves this one. She can write custom SQL/Python that takes care of |
| 181 | all the situation she needs. The Python option is a better fit for her |
| 182 | since she has lots of custom code she needs to run. |
| 183 | |
| 184 | Introspection + migration |
| 185 | ````````````````````````` |
| 186 | |
| 187 | This approach is a combination of `Automatic db introspection`_ and |
| 188 | `Automatically applied migration code`_ -- one command produces a migration |
| 189 | script, you tickle your version number, and the ``syncdb`` runs the migrations. |
| 190 | |
| 191 | Ramifications: |
| 192 | This works very well for Alice, and she gets the added bonus of having |
| 193 | nice migrations stored for future inspection. |
| 194 | |
| 195 | This is Ben's favorite option. |
| 196 | |
| 197 | Carol's indifferent about this as opposed to the previous one: it helps |
| 198 | slightly, but not all that much, really. |
| 199 | |
| 200 | Conclusions |
| 201 | ----------- |
| 202 | |
| 203 | * "Just write SQL" sucks and is obviously wrong and needs to be fixed. |
| 204 | |
| 205 | * DB introspection only goes so far, and doesn't give any way to extend |
| 206 | what comes out beyond "just write SQL". |
| 207 | |
| 208 | * Migration code works pretty well -- which is obvious if you look at how |
| 209 | successful Rails has been with the feature. The Python flavor is obviously |
| 210 | more powerful, but for some uses SQL works well. |
| 211 | |
| 212 | * Introspection + migration seems to be best-of-breed; it even has the added |
| 213 | bonus of getting Alice to save her migrations in source control. |
| 214 | |
| 215 | Proposal |
| 216 | -------- |
| 217 | |
| 218 | So. |
| 219 | |
| 220 | Given all the above, here's how I think schema evolution should work in Django: |
| 221 | |
| 222 | * An optional module, ``django.contrib.evolution`` provides a |
| 223 | way to store versions in the database:: |
| 224 | |
| 225 | class ModelVersion(models.Model): |
| 226 | content_type = models.ForeignKey(ContentType) |
| 227 | version = models.PositiveIntegerField() |
| 228 | |
| 229 | This makes evolution optional and thus doesn't clutter up your tablespace |
| 230 | if you don't need it. |
| 231 | |
| 232 | * Models grow a ``version`` attribute:: |
| 233 | |
| 234 | class MyModel(models.Model): |
| 235 | ... |
| 236 | |
| 237 | class Meta: |
| 238 | version = 3 |
| 239 | |
| 240 | * ``django-admin syncdb`` applies evolutions using the following steps: |
| 241 | |
| 242 | #. Makes sure evolution is installed before proceeding. |
| 243 | |
| 244 | #. For each content type, checks that the version in the database matches |
| 245 | the version in defined in the model. For those that don't match... |
| 246 | |
| 247 | #. Django looks for evolutions in ``<app_package>/upgrade/``. Evolutions |
| 248 | can either be SQL scripts named ``to_version_X.sql`` or Python scripts |
| 249 | named ``to_version_X.py``. |
| 250 | |
| 251 | #. ``django.contrib.evolution`` will contain utilities to abstract evolution |
| 252 | for different db backends. This exact API is TBD. |
| 253 | |
| 254 | #. For SQL evolutions, if a file named ``to_version_X.out`` exists, ``syncdb`` |
| 255 | will expect the output of the evolution to match the text file EXACTLY. If |
| 256 | any differences happen, the transaction will be rolled back (for that reason |
| 257 | SQL evolution scripts must not contain BEGIN/COMMIT). |
| 258 | |
| 259 | For Python evolutions, there will be some way of doing similar tests. |
| 260 | |
| 261 | * ``django-admin find-migrations`` (or somesuch) will inspect the database |
| 262 | and pump out SQL evolution scripts into the right place in the app |
| 263 | package(s). Users will then upgrade the version number and run ``syncdb``. |
| 264 | }}} |
| 265 | |
| 266 | = Comments = |
| 267 | |
| 268 | ''leave comments here or post 'em to django-dev'' |