id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
31869	Improving data migration using `dumpdata` and `loaddata`	Matthijs Kooijman	nobody	"At first glance, using `manage.py dumpdata` and `loaddata` together seems a great way to make a full copy of an existing django installation (e.g. for migrating to a different server, or getting a local copy of your production data, etc.).


Documentation suggests this should be possible. An obvious way would be to do `dumpdata` on one system, followed by `flush` and `loaddata` on the other system.

However when you try it, you get issues with duplicate keys in the contenttypes and similar tables, things like:

{{{
MySQLdb._exceptions.IntegrityError: (1062, ""Duplicate entry 'someapp-somemodel' for key 'django_content_type_app_label_model_76bd3d3b_uniq'"")
}}}

What seems to happen is that `flush` ([https://docs.djangoproject.com/en/dev/ref/django-admin/#flush as documented]) flushes all tables and then reruns ""post-synchronization handlers"", which create content-types and I think permissions and maybe other things as well. Since `dumpdata` does dump these tables, this creates a conflict.

Currently, I think you can  prevent this by:
 - Making and importing a full database dump outside of Django (e.g. using mysqldump). This is a good way to guarantee a really identical copy (though there might be timezone issues with e.g. Mysql), but is often less convenient and does not work across database types (e.g. dumping a remote MySQL database to a local sqlite database).
 - Using natural keys when dumping. The [https://docs.djangoproject.com/en/dev/ref/django-admin/#dumpdata documentation for `dumpdata --natural-foreign`] suggests using natural keys when contenttypes and permissions are involved. I believe this works because the natural foreign keys allow associating any references to these tables to the autocreated versions in the original database. In addition, and I think the documentation does not make this explicit, you would also need to exclude the contenttypes, permissions and any other auto-created models from the dumpdata, or also add `--natural primary`, which I believe makes loaddata overwrite existing data based on the natural primary key rather than adding new data. [[BR]]
   Having to manually exclude models is quite cumbersome for a quick dump-and-load cycle. Also, if the dumped database would somehow contain *less* contenttypes, permissions, etc. than the autocreated ones, the newly loaded database would still contain the extra ones. More generally, the loaded database is not an identical copy of the original one.[[BR]]
   I also had some issues with this approach, due to circular references in my natural keys, but I think this has since been fixed in git.


I wonder if we can make this process easier somewhow?

One solution that springs to mind is to add a `flush --no-handlers` option (or something like that), to prevent running the ""post synchronization handlers"". This would (should) result in empty tables for all tables that are dumped by `dumpdata` (I think this means all tables empty, except for the migration table). Then doing a `dumpdata`, `flush --no-handlers` and `loaddata` could, I think, produce an exact copy of the database, including matching primary keys.

Or are there any other existing ways to make this easier that I missed and/or could be (better) documented?"	New feature	closed	Core (Management commands)	3.1	Normal	wontfix		matthijs@…	Unreviewed	0	0	0	0	0	0