Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#23982 closed Cleanup/optimization (fixed)

Clarify how to properly support both Python 2 and 3 for third-party apps with migrations

Reported by: Luke Plant Owned by: Carl Meyer
Component: Documentation Version: 1.7
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


The basic problem is that Python 2 produces migrations like this:

                ('name', models.CharField(max_length=255, verbose_name=b'First Name', blank=True)),

while Python 3 like this:

                ('name', models.CharField(max_length=255, verbose_name='First Name', blank=True)),

(i.e. notice the lack of b prefix)

If you run 'makemigrations' under Python 3, with migrations made in the first way, it creates a new migration to fix the field.

I've attached a script that shows the problem. I've tested it against master as well as Django 1.7.1 and the bug is still present. A nice unit test for this might not be possible.

There are two things to consider:

  1. Fixing the generator so that it doesn't produce this problem
  2. Fixing the migration code so that it doesn't consider this a problem. Otherwise, going forward, all migrations created with 1.7 and 1.7.1 will produce confusion when people use Python 3, even if we fix 1.

This bug manifests itself most obviously when running makemigrations with 3rd party apps. It looks like we were lucky when it came to the migrations bundled with Django - perhaps they were created by Python 3?

Attachments (1)

migrations_bug_with_python2_python3.txt (1.9 KB) - added by Luke Plant 6 years ago.

Download all attachments as: .zip

Change History (14)

Changed 6 years ago by Luke Plant

comment:1 Changed 6 years ago by Baptiste Mispelon


Isn't the problem the fact that you're using "native strings" (ie byte strings on Python 2 but unicode strings on Python 3) for your verbose_name in your models?

If you're using verbose_name='asdf' on Python2 and you don't have from __future__ import unicode_literals at the top of your file then you are actually using verbose_name=b'asdf' so the migration is correct, no?

comment:2 Changed 6 years ago by Simon Charette

Third party applications supporting both Python 2 and 3 should consistently import unicode_literals to avoid such behavior.

Previous changes made sure migrations generated on Python 2 can be run on Python 3 (and vice versa) but it assumes unicode_literals were used.

The underlying issue here is that b'foo' == u'foo' on Python 2 but not on 3.

Last edited 6 years ago by Simon Charette (previous) (diff)

comment:3 Changed 6 years ago by Markus Holtermann

I ran into a similar issue the other day with one of my 3rd party apps: regarding unicode strings. And I think there is another issue somewhere that covers the byte string serialization in the migration write. I just cannot find it.

comment:4 Changed 6 years ago by Carl Meyer

Component: MigrationsDocumentation
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Yes, charettes and bmispelon are right, this is a bug in the app, not a bug in migrations.

Migrations are correctly and consistently recording the verbose_name as a bytestring when it is a bytestring, and as a text (unicode) string when it is a text string, and noticing the difference when it changes. (Note that migration files always have from __future__ import unicode_literals, so an unmarked string is always a unicode string, on Python 2 or 3). That's the only sane behavior for migrations (and it was arrived at with a great deal of trial and error :-) ).

The app, per our porting to Python 3 documentation, should also be using from __future__ import unicode_literals so that the actual nature of the string does not change when moving between Python 2 and 3. (Or if they don't care to support 3.2, they could explicitly mark all strings with b or u if they prefer that for some reason.)

It would be great if we could provide some kind of hint or warning when creating the migration in this situation, but I don't see how we possibly can. I don't think we can reliably tell, when a migration is generated, whether the string we are seeing was explicitly marked or not, or whether the module containing it had from __future__ import unicode_literals. And even if we could tell, we can't warn if not, because there's nothing wrong with unmarked strings and no __future__ import if the migration is being generated in a project that only supports Python 2 (or only 3).

I'm converting this to a Documentation issue, because the only thing I can see that we could do is add a note about the issue to this section of the docs:

Last edited 6 years ago by Carl Meyer (previous) (diff)

comment:5 Changed 6 years ago by Carl Meyer

Summary: Migrations produced under Python 2 require fixing when run under Python 3Clarify how to properly support both Python 2 and 3 for third-party apps with migrations

comment:6 Changed 6 years ago by Tim Graham

This came up before in #23455.

comment:7 Changed 6 years ago by Carl Meyer

I'm not convinced that playing whack-a-mole with conversions of specific model/field properties makes sense; I think you were right Tim to resist that approach in #23455. It's not clear that auto-converting _any_ such properties in migrations is safe: it introduces an unnecessary difference in behavior between the migration model state and the real models, and it's entirely possible that users will have code, particularly in Python 3, which is called from migrations in one way or another and which breaks if it gets a text string instead of a byte string.

And even if we don't care about that problem, it's still a game of whack-a-mole that doesn't end and only serves to longer conceal the real underlying problem, which will only be fixed by unicode_literals.

comment:8 Changed 6 years ago by Carl Meyer

I think when a doc fix is committed here, it should also revert the change from #23455 (before it gets into a release and causes spurious migrations to be generated). It's much simpler if we maintain "what you put in is what you get out" when it comes to text vs bytes in migration serialization.

comment:9 Changed 6 years ago by Carl Meyer

Oh, looks like #23455 is already in 1.7.1 (odd that doesn't list it as part of the 1.7.1 tag). And I don't think it resulted in spurious migrations generally, because on Python 3 it will usually have been a text string already, and on Python 2 unicode and bytes compare equal so the change doesn't cause a migration. But that also means we can revert it without causing spurious migrations.

comment:10 Changed 6 years ago by Carl Meyer

Has patch: set
Owner: changed from nobody to Carl Meyer
Status: newassigned

Here's a PR for how I would address this:

comment:11 Changed 6 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:12 Changed 6 years ago by Carl Meyer <carl@…>

Resolution: fixed
Status: assignedclosed

In d4bdddeefe35214f194cb85e40dbb761c206e7fa:

Fixed #23982 -- Added doc note on generating Python 2/3 cross-compatible migrations.

Thanks Luke Plant for the report, and Tim Graham, Simon Charette, and Markus
Holtermann for review and discussion.

comment:13 Changed 6 years ago by Carl Meyer <carl@…>

In b37607193639e25a6c630e8d1b3d3bca0833a652:

[1.7.x] Fixed #23982 -- Added doc note on generating Python 2/3 cross-compatible migrations.

Thanks Luke Plant for the report, and Tim Graham, Simon Charette, and Markus
Holtermann for review and discussion.

This is a backport of d4bdddeefe35214f194cb85e40dbb761c206e7fa from master.

Note: See TracTickets for help on using tickets.
Back to Top