Opened 22 months ago

Last modified 4 months ago

#29899 assigned Cleanup/optimization

Adapt the auto-detector to detect changes from model states instead of model classes

Reported by: Simon Charette Owned by: David Wobrock
Component: Migrations Version: master
Severity: Normal Keywords:
Cc: Simon Charette, David Wobrock Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Rendering models for the sole purpose of detecting changes between them takes a significant amount of time on large project.

Instead of comparing the fake models rendered from the state generated by applying state_forwards of all existing migrations against the current model state generating model states from the current model classes and comparing both would shave off a model rendering phase.

This was extracted from the meta ticket #22608 which tracks slow parts of the migration framework.

An initial and dusty stab at this can be found here https://github.com/django/django/compare/master...charettes:autodetector-model-states

Note that this has a bit of overlap with #29898 because both optimizations require a way to resolve relations efficiently from model states.

Change History (4)

comment:1 Changed 4 months ago by David Wobrock

Cc: David Wobrock added
Owner: changed from nobody to David Wobrock
Status: newassigned

Hi there,

I got more or less my head around what is asked in the ticket and how it would improve the autodetector. However, I have some questions and I'd like to ask for some help and expertise on this issue.

I'll just shoot my questions:

  1. Am I understanding correctly that, by using the ModelState objects, instead of the rendered model objects/classes, we avoid calling the ProjectState.apps and ProjectState.concrete_apps (in the Autodetector.init) which are rendering the models? Therefore, we avoid work and speed up the makemigrations and migrate commands?

But since the "from_state" is still generated from the migration graph, we will still apply all existing migrations iteratively to build it, right? So we cannot totally avoid exploring the migration graph?

And let's get down to some more technical business :) I started re-working the patch, it's still a work in progress, but I think most of the logic of the autodetector is migrated to using model states. You should be able to look at the patch commit-by-commit for convenience. I'll still need to work on tests, fix the existing ones and adding some new ones to ensure I introduced no regression. Please find my current progress here https://github.com/django/django/compare/master...David-Wobrock:ticket-29899?expand=1

  1. The one thing about the logic which I'm unsure of, is the "relations" property which is introduced in the ProjectState. The goal is, I guess, to substitute the "related_objects" that a model class' option contains. However, I don't fully understand how the "proxies", "concrete" models and "swappable" ones work together.

I see that we are trying to fill the relations, for each model (identified by app_label+model_name), have the related models (identified by the same key) and the associated fields that have a relation. It's not easy to get your head around the 4 consecutive loops.

  • What are those relations exactly? All possible defined FK (also m2m, ...) on the model itself? Or also fields FK that point to this model?
  • And would it possible, even quickly, to explain how "swappable", "proxy" and "concrete" work together, I'd be grateful. I mean, I guess that we need to resolve the real/concrete model from a proxy model, but "swappable" is my main source of confusion here and I don't know how it should be handled :/ Like a proxy model because we resolve the concrete model when the User model is swappable?

Thanks a lot in advance! I'm assigning myself the issue.

comment:2 Changed 4 months ago by Simon Charette

Hello David,

Am I understanding correctly that, by using the ModelState objects, instead of the rendered model objects/classes, we avoid calling the ProjectState.apps and ProjectState.concrete_apps (in the Autodetector.init) which are rendering the models? Therefore, we avoid work and speed up the makemigrations and migrate commands?

Exactly, accessing .apps and concrete_apps involves model rendering which is really slow.

But since the "from_state" is still generated from the migration graph, we will still apply all existing migrations iteratively to build it, right? So we cannot totally avoid exploring the migration graph?

Right but from_state is generated by calling states_forwards on all operations of on-disk migrations and none of these operations should access .apps during states_forwards which should make them relatively fast.

The one thing about the logic which I'm unsure of, is the "relations" property which is introduced in the ProjectState. The goal is, I guess, to substitute the "related_objects" that a model class' option contains.

That's it

However, I don't fully understand how the "proxies", "concrete" models and "swappable" ones work together.

You'll have to do a bit more of investigation on your side to figure this out I'm afraid but the gist is that proxies models are Meta.proxy = True models which are always backed by a concrete table backed one. swappable refers to models that can be swapped by other ones. Currently only auth.User uses this feature internally.

I see that we are trying to fill the relations, for each model (identified by app_label+model_name), have the related models (identified by the same key) and the associated fields that have a relation. It's not easy to get your head around the 4 consecutive loops.

Yeah this code is tricky for sure.

What are those relations exactly? All possible defined FK (also m2m, ...) on the model itself? Or also fields FK that point to this model?

IIRC they are all forwards relationships which means related fields from the model.

And would it possible, even quickly, to explain how "swappable", "proxy" and "concrete" work together, I'd be grateful. I mean, I guess that we need to resolve the real/concrete model from a proxy model, but "swappable" is my main source of confusion here and I don't know how it should be handled :/ Like a proxy model because we resolve the concrete model when the User model is swappable?

I tried to explain it above but I suggest you start by reading the (limited) swappable model documentation and then dive into how they are currently handled. A lot of this logic is tribal knowledge share by the limited amount of people who worked on the migration framework over the years and its hard to define beyond what's coded and currently passes the suite. The basic idea is that swappable models could have been swapped by another one (e.g. a custom user model via settings.AUTH_USER_MODEL) so it needs to be treated differently.

As mentioned in the description there's a lot of overlap with this ticket and #29898 which I tried to describe on the mailing list to a developer interested in working on the latter for GSoC. Both tickets require access to a ProjectState forwards relationship cache (and maybe a reverse one as well in this case) and both would benefit from having ProjectState define methods that perform state alterations currently baked into Operation.state_forwards overrides. That would allow the auto-detector to simply call these operations if necessary when performing renames instead of maintaining tons of different maps of changes. It would also pave the way for ProjectState/ModelState to have their own .difference(other, questionner=None) methods to break the monolithic autodetector file into testable and extendable components which users have been asking for a while.

comment:3 Changed 4 months ago by David Wobrock

Patch needs improvement: unset

comment:4 Changed 4 months ago by felixxm

Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top