Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#23982 closed Cleanup/optimization (fixed)

Clarify how to properly support both Python 2 and 3 for third-party apps with migrations

Reported by: Luke Plant Owned by: Carl Meyer
Component: Documentation Version: 1.7
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The basic problem is that Python 2 produces migrations like this:

            fields=[
                ('name', models.CharField(max_length=255, verbose_name=b'First Name', blank=True)),
            ],

while Python 3 like this:

            fields=[
                ('name', models.CharField(max_length=255, verbose_name='First Name', blank=True)),
            ],

(i.e. notice the lack of b prefix)

If you run 'makemigrations' under Python 3, with migrations made in the first way, it creates a new migration to fix the field.

I've attached a script that shows the problem. I've tested it against master as well as Django 1.7.1 and the bug is still present. A nice unit test for this might not be possible.

There are two things to consider:

  1. Fixing the generator so that it doesn't produce this problem
  2. Fixing the migration code so that it doesn't consider this a problem. Otherwise, going forward, all migrations created with 1.7 and 1.7.1 will produce confusion when people use Python 3, even if we fix 1.

This bug manifests itself most obviously when running makemigrations with 3rd party apps. It looks like we were lucky when it came to the migrations bundled with Django - perhaps they were created by Python 3?

Attachments (1)

migrations_bug_with_python2_python3.txt (1.9 KB ) - added by Luke Plant 9 years ago.

Download all attachments as: .zip

Change History (14)

by Luke Plant, 9 years ago

comment:1 by Baptiste Mispelon, 9 years ago

Hi,

Isn't the problem the fact that you're using "native strings" (ie byte strings on Python 2 but unicode strings on Python 3) for your verbose_name in your models?

If you're using verbose_name='asdf' on Python2 and you don't have from __future__ import unicode_literals at the top of your file then you are actually using verbose_name=b'asdf' so the migration is correct, no?

comment:2 by Simon Charette, 9 years ago

Third party applications supporting both Python 2 and 3 should consistently import unicode_literals to avoid such behavior.

Previous changes made sure migrations generated on Python 2 can be run on Python 3 (and vice versa) but it assumes unicode_literals were used.

The underlying issue here is that b'foo' == u'foo' on Python 2 but not on 3.

Last edited 9 years ago by Simon Charette (previous) (diff)

comment:3 by Markus Holtermann, 9 years ago

I ran into a similar issue the other day with one of my 3rd party apps: https://github.com/MarkusH/django-dynamic-forms/issues/11 regarding unicode strings. And I think there is another issue somewhere that covers the byte string serialization in the migration write. I just cannot find it.

comment:4 by Carl Meyer, 9 years ago

Component: MigrationsDocumentation
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Yes, charettes and bmispelon are right, this is a bug in the app, not a bug in migrations.

Migrations are correctly and consistently recording the verbose_name as a bytestring when it is a bytestring, and as a text (unicode) string when it is a text string, and noticing the difference when it changes. (Note that migration files always have from __future__ import unicode_literals, so an unmarked string is always a unicode string, on Python 2 or 3). That's the only sane behavior for migrations (and it was arrived at with a great deal of trial and error :-) ).

The app, per our https://docs.djangoproject.com/en/dev/topics/python3/#unicode-literals porting to Python 3 documentation, should also be using from __future__ import unicode_literals so that the actual nature of the string does not change when moving between Python 2 and 3. (Or if they don't care to support 3.2, they could explicitly mark all strings with b or u if they prefer that for some reason.)

It would be great if we could provide some kind of hint or warning when creating the migration in this situation, but I don't see how we possibly can. I don't think we can reliably tell, when a migration is generated, whether the string we are seeing was explicitly marked or not, or whether the module containing it had from __future__ import unicode_literals. And even if we could tell, we can't warn if not, because there's nothing wrong with unmarked strings and no __future__ import if the migration is being generated in a project that only supports Python 2 (or only 3).

I'm converting this to a Documentation issue, because the only thing I can see that we could do is add a note about the issue to this section of the docs: https://docs.djangoproject.com/en/dev/topics/migrations/#libraries-third-party-apps

Version 1, edited 9 years ago by Carl Meyer (previous) (next) (diff)

comment:5 by Carl Meyer, 9 years ago

Summary: Migrations produced under Python 2 require fixing when run under Python 3Clarify how to properly support both Python 2 and 3 for third-party apps with migrations

comment:6 by Tim Graham, 9 years ago

This came up before in #23455.

comment:7 by Carl Meyer, 9 years ago

I'm not convinced that playing whack-a-mole with conversions of specific model/field properties makes sense; I think you were right Tim to resist that approach in #23455. It's not clear that auto-converting _any_ such properties in migrations is safe: it introduces an unnecessary difference in behavior between the migration model state and the real models, and it's entirely possible that users will have code, particularly in Python 3, which is called from migrations in one way or another and which breaks if it gets a text string instead of a byte string.

And even if we don't care about that problem, it's still a game of whack-a-mole that doesn't end and only serves to longer conceal the real underlying problem, which will only be fixed by unicode_literals.

comment:8 by Carl Meyer, 9 years ago

I think when a doc fix is committed here, it should also revert the change from #23455 (before it gets into a release and causes spurious migrations to be generated). It's much simpler if we maintain "what you put in is what you get out" when it comes to text vs bytes in migration serialization.

comment:9 by Carl Meyer, 9 years ago

Oh, looks like #23455 is already in 1.7.1 (odd that https://github.com/django/django/commit/e8a08514de6b97de1fce13b6c6b24cf72d1d60d9 doesn't list it as part of the 1.7.1 tag). And I don't think it resulted in spurious migrations generally, because on Python 3 it will usually have been a text string already, and on Python 2 unicode and bytes compare equal so the change doesn't cause a migration. But that also means we can revert it without causing spurious migrations.

comment:10 by Carl Meyer, 9 years ago

Has patch: set
Owner: changed from nobody to Carl Meyer
Status: newassigned

Here's a PR for how I would address this: https://github.com/django/django/pull/3718

comment:11 by Tim Graham, 9 years ago

Triage Stage: AcceptedReady for checkin

comment:12 by Carl Meyer <carl@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In d4bdddeefe35214f194cb85e40dbb761c206e7fa:

Fixed #23982 -- Added doc note on generating Python 2/3 cross-compatible migrations.

Thanks Luke Plant for the report, and Tim Graham, Simon Charette, and Markus
Holtermann for review and discussion.

comment:13 by Carl Meyer <carl@…>, 9 years ago

In b37607193639e25a6c630e8d1b3d3bca0833a652:

[1.7.x] Fixed #23982 -- Added doc note on generating Python 2/3 cross-compatible migrations.

Thanks Luke Plant for the report, and Tim Graham, Simon Charette, and Markus
Holtermann for review and discussion.

This is a backport of d4bdddeefe35214f194cb85e40dbb761c206e7fa from master.

Note: See TracTickets for help on using tickets.
Back to Top