Opened 4 years ago

Closed 4 months ago

Last modified 4 months ago

#23406 closed New feature (fixed)

Allow migrations to be loaded from .pyc files (e.g. in a frozen environment)

Reported by: Daniel Menzel Owned by: Dan Watson
Component: Migrations Version: 1.7
Severity: Normal Keywords: migrations, .pyc, frozen, cx_Freeze
Cc: Andrew Godwin, Dan Watson, Carlton Gibson Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Description

When only .pyc files are available, for example in a frozen environment after running cx_Freeze or similar tools, migrations no longer work: the migrate command does not find any migrations to apply, and produces the following output:

Operations to perform:
  Synchronize unmigrated apps: <some apps that really have no migrations>
  Apply all migrations: (none)
Synchronizing apps without migrations:
  Creating tables...
    Creating table <some tables>
  Installing custom SQL...
  Installing indexes...
Running migrations:
  No migrations to apply.
  Your models have changes that are not yet reflected in a migration, and so won't be applied.
  Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.

Subsequent operations then fail because the database is unmigrated.

The typical content of a frozen migrations folder looks like this:

django
  contrib
    contenttypes
      migrations
        __init__.pyc
        0001_initial.pyc

Analysis

Due to issue https://code.djangoproject.com/ticket/23237, there was a change in django.db.migrations.loader (https://code.djangoproject.com/changeset/267630ad50c69ebfe594de37a0636264aa5be7d6): Previously, migrations were found if they had a .py, .pyc, or .pyo extension. Now only migrations with .py extensions are found.

Perhaps it would be possible to add a check if Django is running in a frozen environment:

if hasattr(sys, 'frozen'):
    extensions = ('.py', '.pyc')
else:
    extensions = ('.py', )
...
if name.endswith(extensions):
    ...

or make the list of allowed extensions configurable by the user.

Attachments (1)

migrations_loader.patch (1.3 KB) - added by jdetaeye 3 years ago.
Patch to support loading from Python packages in any format

Download all attachments as: .zip

Change History (42)

comment:1 Changed 4 years ago by Tim Graham

Cc: Andrew Godwin added
Triage Stage: UnreviewedAccepted

I forget the reasoning, but I believe the decision was not to support frozen environments. We should at least document this.

comment:2 Changed 4 years ago by Carl Meyer

The reason for the change was because treating an orphan .pyc as a real migration causes too many problems with things like switching branches in git.

I think not looking for .pyc files is the right default behavior, but I wouldn't have any problem with some form of configurability to support things like cx_Freeze.

comment:3 Changed 4 years ago by Andrew Godwin

Well, we'd have to introduce a setting if we wanted to make this work (a la SOUTH_INCLUDE_PYC), and I'm somewhat averse to adding more settings than necessary. We definitely can't change the default behaviour back; people were seeing all sorts of bugs when switching branches due to .pyc files.

comment:4 Changed 4 years ago by Baptiste Mispelon

Can't we get away with an option passed to the command (something like --pyc)?

comment:5 Changed 4 years ago by Daniel Menzel

Or perhaps add the list of searched extensions as a class attribute, so that monkey-patching is made easier?

comment:6 Changed 4 years ago by Andrew Godwin

If it was to work correctly, it would need to be passed to makemigrations, migrate, sqlmigrate and squashmigrations, but I suspect we could get away with just adding it to migrate and sqlmigrate? I'm still not sure about that, but seems better than a setting.

comment:7 Changed 4 years ago by Shai Berger

I think migrate and sqlmigrate are the only required ones for the use-case (you don't generate or manipulate migrations in a frozen environment).

But I don't like the flag approach -- it essentially passes the problem from the developer to their user.

I think the correct approach is an app that overrides ("enhances", if you like) the built-in migration commands, in much the same way that South enhanced syncdb. If it is hard to write as a 3rd-party app, we should make the changes required to make it easy.

comment:8 Changed 4 years ago by Andrew Godwin

Well, there's definitely a call here to have the underlying implementation (i.e. MigrationLoader) take a list of extensions to allow, and then I think we may as well go the extra leg and add a --pyc flag; I'd hate to have people have to override the entire command just to get this behaviour.

comment:9 Changed 4 years ago by Omer Katz

+1 for --pyc. I think it's a good idea.
Having to extend anything in the code in order to migrate on frozen environments seems like a bit too much work.

comment:10 Changed 4 years ago by Shai Berger

Yeah, the idea to add an app was mostly a way to do this in settings.py without introducing a new setting.

So I take that back. But the main point remains: The determination of how migrations are found is a property of the deployment, not a property of a single invocation of migration commands. A setting is much better than a flag.

On a related note, the setting should probably be more general than just "use .pyc's". What about finding migrations in zip files? I can come up with a couple of less likely candidates as well.

So yes, scratch the app overriding the commands. We need a setting to specify the path to a migration loader, and we probably need to ship, along with the standard loader, one which picks up .pyc files (and maybe one which peeks in archives).

comment:11 in reply to:  10 Changed 4 years ago by andreydani

I agree a setting for the file extension would be better.

I would like to use *.pyc on production (frozen) environments, but not when developing or testing.

I have also made a small os.listdir replacement that searches for modules inside zip files.

def listdir(path):
    if not os.path.isdir(path):
        pos = path.lower().find('.zip')
        if pos >= 0:
            pos += 4
            zip = path[:pos]
            path_within = path[pos + 1:].replace('\\', '/')
            from zipfile import ZipFile
            zip = ZipFile(zip)
            return [a[len(path_within)+1:] for a in zip.namelist() if a.startswith(path_within)]
    return os.listdir(path)

comment:12 Changed 3 years ago by jdetaeye

The Django commands had the same issue: The code scanned for files in the folder with a certain extension, rather than using the proper Python API to peek inside the contents of the package.
Please have a look at ticket #8280 and its resolution.

6(!) years after reporting, this got fixed. And history repeats itself now with the migrations...

comment:13 Changed 3 years ago by Carl Meyer

I'm not sure the fix for #8280 is actually appropriate here - I don't believe that by default orphan .pyc files should be considered as present migrations. I think the practical hassles that causes to daily development with e.g. git branches outweigh the purity of using the proper Python API for module discovery.

The distinction here is that usually the presence or absence of a module that hasn't been explicitly requested doesn't fundamentally alter the behavior of the system. With management commands, it does a bit (they'll show up in help) but not in a way that's likely to break anything. Here, having extra migrations around that shouldn't be around in this branch seriously breaks things.

If we made migrations discover .pyc by default, we would basically have to document that PYTHONDONTWRITEBYTECODE=1 is required for effective local development in Django.

I'm not opposed to an option to enable .pyc discovery, but I am opposed to making it the default.

Changed 3 years ago by jdetaeye

Attachment: migrations_loader.patch added

Patch to support loading from Python packages in any format

comment:14 Changed 3 years ago by jdetaeye

For what it's worth, a patch is attached for this issue. It allows migrations in any package format to be picked up, be it zip, egg, file, ...

I think the practical hassles that causes to daily development with e.g. git branches outweigh the purity of using the proper Python API for module discovery."

I don't agree with this comment.
IMHO, a migration is a python package just like any other, and should be treated as such.If some development practices have an issue with that it's their problem - not django's.

comment:15 in reply to:  14 Changed 3 years ago by Carl Meyer

Replying to jdetaeye:

IMHO, a migration is a python package just like any other, and should be treated as such.

It's not a package like any other, because normal Python packages don't have an effect on the system unless explicitly imported.

If some development practices have an issue with that it's their problem - not django's.

Your patch will cause serious, difficult to debug problems for any development practice which involves adding and removing Python modules (including migrations) from source code under local development, while ignoring pyc files. (I know this is the case, because for a period of time in the 1.7 development process, orphan pyc files were loaded as migrations, and it did cause these problems.) I think you would be hard pressed to find any current common Django development process that doesn't fit that description.

If you have a practical proposal for how to address that problem (whether with documentation changes or otherwise) in your patch, I'm interested in discussing it. If you think it's "not Django's problem" and can be ignored, I'm afraid we just disagree, and I have to remain -1 on merging your patch.

comment:16 Changed 3 years ago by Aymeric Augustin

I agree with Carl.

Arguments along the lines of "it's the user's problem" are a red flag when the problem is likely to affect the vast majority of users — in this case, everyone who uses a VCS and doesn't have PYTHONDONTWRITEBYTECODE set.

comment:17 Changed 3 years ago by jdetaeye

Fair enough, I understand the arguments.

My use case is somewhat different. I'm packaging my product with py2exe as an easy-to-install application (especially for Windows users). Out of the 7 python packages it needs, only Django needs patching to behave nicely in a frozen zip-file...

comment:18 Changed 3 years ago by Carl Meyer

I understand the use case. (I would guess that none of your other six dependencies have a situation with similar constraints to Django migration files). As I said before, I'm in favor of a patch that would change your case from "Django needs patching" to "Django needs a non-default flag/setting." That's why this ticket is in Accepted state. It's just waiting for a patch that implements it as a non-default option.

I also agree with Shai that a setting is more appropriate than a flag, because this is a property of a deployment, not a particular invocation.

(There remains an open question about zipfiles, I guess. I have no problem with supporting zip-import by default, but I'm not sure if it's possible to do that without also supporting .pyc by default, which is problematic.)

comment:19 Changed 17 months ago by Dylan Young

While I don't agree with the conclusion here (migration files *are* explicitly imported when you run migrate; leftover .pyc files are a tooling problem... think post checkout hooks), if this is the route to go (and it's probably wise to do this anyways for other use-cases), I would suggest that the setting to control this behaviour simply be MIGRATION_LOADERS (as opposed to a tuple of extensions to support, for example) and supply default loaders (.py, .pyc, archive) with hooks for subclassing/creating new migration loaders. The setting would be a tuple/list and the loaders would be consulted in list order (avoids the need to mixin for every extension you want to support). This would also allow things like a network loader for example.

comment:20 Changed 5 months ago by Dan Watson

Has patch: set
Owner: changed from nobody to Dan Watson
Status: newassigned

Reviving this ticket a little bit. I think something like this can be useful in container environments, where you can guarantee pyc files are generated with the same python binary being used to run them, mitigating most of the drawbacks listed in https://www.curiousefficiency.org/posts/2011/04/benefits-and-limitations-of-pyc-only.html (an oldie but a goodie).

Created a pull request for a new MIGRATIONS_INCLUDE_PYC setting, complete with docs and tests:

https://github.com/django/django/pull/9718

comment:21 Changed 5 months ago by Dan Watson

Cc: Dan Watson added

comment:22 Changed 5 months ago by Andrew Godwin

Well, the problem we have these days is that there are no longer any .pyc files in the migrations directory (they go into pycache) in Python 3 - your patch does not seem to address this, so it'll never work unless people manually move them into place.

comment:23 Changed 5 months ago by Dan Watson

It's true that by default, pyc files are compiled into __pycache__ directories, but there are explicit affordances made to import pyc files that exist in place of the original py file. See https://www.python.org/dev/peps/pep-3147/#case-4-legacy-pyc-files-and-source-less-imports

This is the case for frozen or sourceless distributions. There are switches in compileall to output legacy pyc files for this purpose (and in fact this is what the unit test does). Whether frozen or sourceless distributions are a good idea is a much more subjective and (seemingly) controversial issue, but they are still possible, and this is how.

This setting is pretty narrow in scope, just to allow frozen distributions (and is off by default with a warning in the docs about the problems it may cause).

comment:24 in reply to:  22 Changed 5 months ago by Carlton Gibson

Cc: Carlton Gibson added

The patch as presented, and given the history of the ticket, looks great.

Ref comment from Andrew Godwin:

... there are no longer any .pyc files in the migrations directory (they go into pycache) in Python 3 ...

So, maybe the original problem (of stale .pyc files when switching branches) has gone away? They'll only be there if you've taken deliberate steps to make sure they are.

So can we just load them? Do we still need the setting? Do we need the warning around the setting? Could we not now revert 51673c14, which was Andrew's commit restricting the search to just .py files?

Assuming we don't want to revert, I'm inclined to call this Ready for checkin.

comment:25 in reply to:  22 Changed 5 months ago by Dan Watson

I'm actually in favor of reverting. Hadn't considered that now that Django is Python 3 only. If you have legacy stale pyc files, it's because you compiled them yourself now, which means you'd probably be using this setting. So why have the setting at all :)

I wonder what Python 3 does when it sees a pyc file compiled with Python 2. That would be the only place I could see breakage for existing projects with old Python 2 pyc files hanging around, but my guess is that they'd be ignored (or fail loudly and hopefully obviously).

comment:26 Changed 4 months ago by Carlton Gibson

Dan can you close your existing PR and open a new one PR reverting the original change (explaining why that's OK now)?

From there we'll mark that RFC and get Tim to make a final review.

(We'll either revert, or say "no, lets take the patch", or failing that the only other option is wontfix — either way we'll get this one resolved.)

Thanks!

Last edited 4 months ago by Carlton Gibson (previous) (diff)

comment:27 Changed 4 months ago by Carlton Gibson

Has patch: unset

comment:28 Changed 4 months ago by Tim Graham

Is there a disadvantage to waiting to revert 51673c146e48ff6230eace39bac18fd555607bee until after Python 2 is end-of-life in 2020? I'm not really sure, but ignoring of pyc files may still have some value as long as third-party apps support Python 2 and 3.

comment:29 in reply to:  28 Changed 4 months ago by Carlton Gibson

Replying to Tim Graham:

Is there a disadvantage to waiting to revert 51673c146e48ff6230eace39bac18fd555607bee until after Python 2 is end-of-life in 2020? I'm not really sure, but ignoring of pyc files may still have some value as long as third-party apps support Python 2 and 3.

I can't quite see how an issue would come up (given that this would go into Django 2.1, which is Python 3 only) but I can't rule it out either. (There's nothing so weird and wonderful as programming. :)

There's no disadvantage really. Dan's PR is clean enough. If we're not going to revert 51673c14 (and not go for wontfix) I think we should re-open the PR and treat it as Ready for checkin, for a final review from you. (I was at the point of marking it so, bar the revert option coming up.)

comment:30 in reply to:  22 Changed 4 months ago by Carlton Gibson

Triage Stage: AcceptedReady for checkin

I don't want this one to slip through the cracks because of the revert/not-revert question.

I've re-opened Dan's PR and marked it RFC. We should either merge it, or decide that reverting is OK (or close as wontfix).

comment:31 Changed 4 months ago by Claude Paroz

Adding a new setting around .pyc files in Django 2.1 looks very outdated. I'm in favour of won't fix.

comment:32 Changed 4 months ago by Tim Graham

I also don't see the purpose of adding the setting at this point. If there's some advantage to reverting the ignoring of pyc files, then I'd rather do that now instead of adding a setting for it.

comment:33 Changed 4 months ago by Carlton Gibson

The "some advantage" is in being able to ship a .pyc only version of you code. This isn't something I've done but, reviewing the issue it seemed a reasonable use-case. (The discussion ≈3 years ago didn't close...)

As far as I can see, in this Python 3 world you're only going to run into this if you're deliberately creating the legacy .pyc files. In that case you should already know what you're doing.
You're not going to hit anything that a find . -name '*.pyc' -delete won't fix.

comment:34 Changed 4 months ago by Tim Graham

If third-party apps continue to support Python 2 until it's EOL in 2020, then there's a possibility pyc files will be generated. If we revert 51673c146e48ff6230eace39bac18fd555607bee, then using a version of Django that doesn't ignore pyc files will given an exception like ImportError: bad magic number in 'polls.migrations.0002_auto_20180312_1426': b'\x03\xf3\r\n' if a Python 2 pyc file is present. Manually removing pyc files fixes that. If we decided to revert 51673c146e48ff6230eace39bac18fd555607bee for Django 2.1, I think it would be useful to add a more helpful message about the magic number error instructing the user to manually remove pyc files. I'm not sure if the extra hassle in that use case is worth the trade off of supporting frozen environments (assuming that Python 3 frozen environments generate pyc files -- I'm not sure about this as I haven't used frozen environments before).

By Django 3.1 (released August 2020), I think we can safely assume most apps won't support Python 2, so we could revert the pyc ignoring in Django without much concern. The MIGRATIONS_INCLUDE_PYC setting would allow more easily using frozen environments in Django 2.1 (to be released August 2018), two years earlier than if we take the alternate route. Given the problem is there since Django 1.7 (September 2014), I don't know if two more years makes much difference.

comment:35 Changed 4 months ago by Dan Watson

The only time I could see .pyc files hanging around is if you're sharing a local copy of a third-party application between two different versions of Python (i.e. pip install -e in both environments). And presumably the main motivation behind the original decision to ignore .pyc files (switching branches leaving stale files) does not apply nearly as much to third-party apps you are using as code you are developing. But like Carlton said, programming is weird and wonderful, so nothing would surprise me.

I think the biggest "risk" in reverting 51673c146e48ff6230eace39bac18fd555607bee is for a project initially switching to Python 3 (and Django 2.x), where the migrations directories may already contain stale .pyc files. Seems like a documentation note about clearing out .pyc files might be in order. If I'm not mistaken, we have similar issues with templatetags - the templatetags loader uses pkgutil:

https://github.com/django/django/blob/master/django/template/backends/django.py#L114

So I think a note about clearing out .pyc files is a good idea regardless. Obviously I'm biased, but I think the fact that this has been an issue for 3-4 years is not a good reason to wait 2 more years to rip off the bandaid. Unless Django as a project decides it doesn't want to officially "support" frozen/sourceless environments, which I think would also be a shame, given how close it already is.

comment:36 Changed 4 months ago by Tim Graham

Would you like to offer a patch that reverts 51673c146e48ff6230eace39bac18fd555607bee and makes the migration loader catch a ImportError: bad magic number exception and reraise it with a message that informs the user to delete pyc files?

comment:37 Changed 4 months ago by Tim Graham

Summary: Migrations not found when only .pyc files are available (e.g. in a frozen environment)Allow migrations to be loaded from .pyc files (e.g. in a frozen environment)
Triage Stage: Ready for checkinAccepted
Type: BugNew feature

comment:38 Changed 4 months ago by Dan Watson

Has patch: set

Created a new PR based on Tim's suggestion here:

https://github.com/django/django/pull/9789

comment:39 Changed 4 months ago by Carlton Gibson

Triage Stage: AcceptedReady for checkin

comment:40 Changed 4 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 29150d5d:

Fixed #23406 -- Allowed migrations to be loaded from .pyc files.

comment:41 Changed 4 months ago by Tim Graham <timograham@…>

In f3b1c3b:

Refs #23406 -- Fixed "invalid escape sequence" warning in migrations test.

Note: See TracTickets for help on using tickets.
Back to Top