| | 1 | {{{ |
| | 2 | #!rst |
| | 3 | Schema migration is one of those "hard" problems that I think will be impossible |
| | 4 | to get right for all the cases. What follows are my thoughts on how to get |
| | 5 | as close as possible in Django. |
| | 6 | |
| | 7 | (please leave comments at the bottom and not inline to the proposal!) |
| | 8 | |
| | 9 | Prior art |
| | 10 | --------- |
| | 11 | |
| | 12 | A few bits of prior arts I examined. |
| | 13 | |
| | 14 | * http://api.rubyonrails.com/classes/ActiveRecord/Migration.html |
| | 15 | * http://sqlobject.org/sqlobject-admin.html#versioning-upgrading |
| | 16 | * http://www.varlena.com/varlena/GeneralBits/127.php |
| | 17 | |
| | 18 | Use cases |
| | 19 | --------- |
| | 20 | |
| | 21 | 1. Alice has a blog application written in Django. A blog entry looks |
| | 22 | like:: |
| | 23 | |
| | 24 | class Entry(models.Model): |
| | 25 | title = models.CharField(maxlength=30) |
| | 26 | body = models.TextField() |
| | 27 | |
| | 28 | After using this to write some blog entries, she realizes that she |
| | 29 | really should have had a publish date field on her model, and she |
| | 30 | wants to add one. |
| | 31 | |
| | 32 | Alice doesn't particularly care about old entries; she just wants them to |
| | 33 | have an empty publish date. She knows just enough SQL to understand how |
| | 34 | relational databases work, but that's about it. |
| | 35 | |
| | 36 | 2. Ben has a simple ticket tracker with a ticket class that contains:: |
| | 37 | |
| | 38 | class Ticket(models.Model): |
| | 39 | reporter = models.EmailField() |
| | 40 | owner = models.EmailField() |
| | 41 | status = models.IntegerField(choices=STATUS_CHOICES) |
| | 42 | description = models.TextField() |
| | 43 | |
| | 44 | After two years of use, Ben wants to product information to his ticket |
| | 45 | tracker and has decided he doesn't care about who reported a ticket, so |
| | 46 | he writes these models:: |
| | 47 | |
| | 48 | class Product(models.Model): |
| | 49 | ... |
| | 50 | |
| | 51 | class Ticket(models.Model): |
| | 52 | owner = models.EmailField() |
| | 53 | status = models.IntegerField(choices=STATUS_CHOICES) |
| | 54 | description = models.TextField() |
| | 55 | product = models.ForeignKey(Product) |
| | 56 | |
| | 57 | Ben knows quite a bit of SQL; he's also the company DBA. |
| | 58 | |
| | 59 | 3. Carol supports a newsroom with a set of models that look like:: |
| | 60 | |
| | 61 | class Reporter(models.Model): |
| | 62 | ... |
| | 63 | |
| | 64 | class Article(models.Model): |
| | 65 | reporter = models.ForeignKey(Reporter) |
| | 66 | ... |
| | 67 | |
| | 68 | class Section(models.Model): |
| | 69 | stories = models.ManyToManyField(Article) |
| | 70 | ... |
| | 71 | |
| | 72 | As it turns out, her conception of ``Section`` was naive, and on top of |
| | 73 | that she wants to separate certain types of articles into multiple |
| | 74 | classes. She rewrites her models to look like:: |
| | 75 | |
| | 76 | class Reporter(models.Model): |
| | 77 | ... |
| | 78 | |
| | 79 | class Category(models.Model): |
| | 80 | ... |
| | 81 | |
| | 82 | class Article(models.Model): |
| | 83 | reporter = models.ForeignKey(Reporter) |
| | 84 | categories = models.ManyToManyField(Category) |
| | 85 | ... |
| | 86 | |
| | 87 | class Obituary(models.Model): |
| | 88 | ... |
| | 89 | |
| | 90 | class WeddingAnnouncement(models.Model): |
| | 91 | ... |
| | 92 | |
| | 93 | class Section(models.Model): |
| | 94 | included_categories = models.ManyToManyField(Category) |
| | 95 | ... |
| | 96 | |
| | 97 | Carol obviously needs to keep all the data in her system. Some articles |
| | 98 | will need to be moved into ``Obituary`` or ``WeddingAnnouncement``, |
| | 99 | objects. Additionally, stories will be assigned categories from a mapping |
| | 100 | given to her by the web editor since the direct mapping has been removed. |
| | 101 | |
| | 102 | To add to the fun, Carol has 7 different news sites with separate |
| | 103 | databases using this system. Three of those systems run MySQL, the |
| | 104 | other four run PostgreSQL. |
| | 105 | |
| | 106 | Carol is a very experienced developer; she's been doing this web thing |
| | 107 | for almost ten years now. |
| | 108 | |
| | 109 | Possible solutions |
| | 110 | ------------------ |
| | 111 | |
| | 112 | Write SQL |
| | 113 | ````````` |
| | 114 | |
| | 115 | This is the current situation: you write a bunch of SQL and bung it into |
| | 116 | your interactive interpreter. |
| | 117 | |
| | 118 | Ramifications: |
| | 119 | Alice is a simple enough case that she shouldn't have to know |
| | 120 | SQL to make this work. |
| | 121 | |
| | 122 | This situation sounds good to Ben: it's just two lines of SQL. |
| | 123 | |
| | 124 | Carol would have to apply this SQL against 7 different databases (which is |
| | 125 | a waste of time), and she'd still have to write a bunch of Python code |
| | 126 | to deal with the data migration. |
| | 127 | |
| | 128 | Automatic db introspection |
| | 129 | `````````````````````````` |
| | 130 | |
| | 131 | In this scenario, Django inspects your models and your database, and |
| | 132 | outputs the SQL for adding/removing fields. You're supposed to |
| | 133 | inspect the SQL and then stick it into your database using something |
| | 134 | like:: |
| | 135 | |
| | 136 | $ ./manage.py sqlupdate | psql mydb |
| | 137 | |
| | 138 | Ramifications: |
| | 139 | This works for Alice; she does ``django-admin sqlupgrade | mysql`` and |
| | 140 | is happy. However, if the SQL that Django put out happened to have a |
| | 141 | mistake in it, she might not have noticed, and could have lost some |
| | 142 | data. |
| | 143 | |
| | 144 | Ben loves this situation; it does his two lines of SQL automatically, |
| | 145 | and he can poke at what's put out to make sure it's right for him. |
| | 146 | |
| | 147 | This doesn't really help Carol. Although the SQL that is put out is a |
| | 148 | start, it doesn't really get her very far. She still has to write |
| | 149 | a bunch of data migration by hand, and she also has no way of applying |
| | 150 | the SQL she writes quickly. |
| | 151 | |
| | 152 | Automatically applied migration code |
| | 153 | ```````````````````````````````````` |
| | 154 | |
| | 155 | In this scenario, you give your models a version. When it comes time to upgrade, |
| | 156 | you write a "upgrade-to-version-3" script and increment your version number. |
| | 157 | When you run a ``django-admin`` command (probably the newly-added ``syncdb``), |
| | 158 | Django notices that you've added an upgrade script and runs it automatically. |
| | 159 | |
| | 160 | This type of system comes in two (not necessarily exclusive) flavors: |
| | 161 | |
| | 162 | * SQL only, where your upgrade scripts are in SQL. |
| | 163 | |
| | 164 | * ActiveRecord Migration style, where you write your migrations |
| | 165 | in Python (which gets converted to SQL for you). |
| | 166 | |
| | 167 | Each situation also could support symmetric migration (where you provide |
| | 168 | both an upgrade and a downgrade script). |
| | 169 | |
| | 170 | Ramifications: |
| | 171 | Alice doesn't like the SQL flavor at all (she wants to use Python!). The |
| | 172 | Pythonic one works very well for her since the syntax mirrors other |
| | 173 | parts of Django she's familiar with. However, she'd prefer something |
| | 174 | like the automatic solution since it would save her from having to type |
| | 175 | anything at all. |
| | 176 | |
| | 177 | Ben really likes this one; the SQL is still easy to write, and it gets |
| | 178 | into SVN so he can track his upgrades. |
| | 179 | |
| | 180 | Carol loves this one. She can write custom SQL/Python that takes care of |
| | 181 | all the situation she needs. The Python option is a better fit for her |
| | 182 | since she has lots of custom code she needs to run. |
| | 183 | |
| | 184 | Introspection + migration |
| | 185 | ````````````````````````` |
| | 186 | |
| | 187 | This approach is a combination of `Automatic db introspection`_ and |
| | 188 | `Automatically applied migration code`_ -- one command produces a migration |
| | 189 | script, you tickle your version number, and the ``syncdb`` runs the migrations. |
| | 190 | |
| | 191 | Ramifications: |
| | 192 | This works very well for Alice, and she gets the added bonus of having |
| | 193 | nice migrations stored for future inspection. |
| | 194 | |
| | 195 | This is Ben's favorite option. |
| | 196 | |
| | 197 | Carol's indifferent about this as opposed to the previous one: it helps |
| | 198 | slightly, but not all that much, really. |
| | 199 | |
| | 200 | Conclusions |
| | 201 | ----------- |
| | 202 | |
| | 203 | * "Just write SQL" sucks and is obviously wrong and needs to be fixed. |
| | 204 | |
| | 205 | * DB introspection only goes so far, and doesn't give any way to extend |
| | 206 | what comes out beyond "just write SQL". |
| | 207 | |
| | 208 | * Migration code works pretty well -- which is obvious if you look at how |
| | 209 | successful Rails has been with the feature. The Python flavor is obviously |
| | 210 | more powerful, but for some uses SQL works well. |
| | 211 | |
| | 212 | * Introspection + migration seems to be best-of-breed; it even has the added |
| | 213 | bonus of getting Alice to save her migrations in source control. |
| | 214 | |
| | 215 | Proposal |
| | 216 | -------- |
| | 217 | |
| | 218 | So. |
| | 219 | |
| | 220 | Given all the above, here's how I think schema evolution should work in Django: |
| | 221 | |
| | 222 | * An optional module, ``django.contrib.evolution`` provides a |
| | 223 | way to store versions in the database:: |
| | 224 | |
| | 225 | class ModelVersion(models.Model): |
| | 226 | content_type = models.ForeignKey(ContentType) |
| | 227 | version = models.PositiveIntegerField() |
| | 228 | |
| | 229 | This makes evolution optional and thus doesn't clutter up your tablespace |
| | 230 | if you don't need it. |
| | 231 | |
| | 232 | * Models grow a ``version`` attribute:: |
| | 233 | |
| | 234 | class MyModel(models.Model): |
| | 235 | ... |
| | 236 | |
| | 237 | class Meta: |
| | 238 | version = 3 |
| | 239 | |
| | 240 | * ``django-admin syncdb`` applies evolutions using the following steps: |
| | 241 | |
| | 242 | #. Makes sure evolution is installed before proceeding. |
| | 243 | |
| | 244 | #. For each content type, checks that the version in the database matches |
| | 245 | the version in defined in the model. For those that don't match... |
| | 246 | |
| | 247 | #. Django looks for evolutions in ``<app_package>/upgrade/``. Evolutions |
| | 248 | can either be SQL scripts named ``to_version_X.sql`` or Python scripts |
| | 249 | named ``to_version_X.py``. |
| | 250 | |
| | 251 | #. ``django.contrib.evolution`` will contain utilities to abstract evolution |
| | 252 | for different db backends. This exact API is TBD. |
| | 253 | |
| | 254 | #. For SQL evolutions, if a file named ``to_version_X.out`` exists, ``syncdb`` |
| | 255 | will expect the output of the evolution to match the text file EXACTLY. If |
| | 256 | any differences happen, the transaction will be rolled back (for that reason |
| | 257 | SQL evolution scripts must not contain BEGIN/COMMIT). |
| | 258 | |
| | 259 | For Python evolutions, there will be some way of doing similar tests. |
| | 260 | |
| | 261 | * ``django-admin find-migrations`` (or somesuch) will inspect the database |
| | 262 | and pump out SQL evolution scripts into the right place in the app |
| | 263 | package(s). Users will then upgrade the version number and run ``syncdb``. |
| | 264 | }}} |
| | 265 | |
| | 266 | = Comments = |
| | 267 | |
| | 268 | ''leave comments here or post 'em to django-dev'' |