Opened 5 years ago

Closed 3 years ago

Last modified 3 years ago

#29899 closed Cleanup/optimization (fixed)

Adapt the auto-detector to detect changes from model states instead of model classes

Reported by: Simon Charette Owned by: David Wobrock
Component: Migrations Version: dev
Severity: Normal Keywords:
Cc: Simon Charette, David Wobrock Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Rendering models for the sole purpose of detecting changes between them takes a significant amount of time on large project.

Instead of comparing the fake models rendered from the state generated by applying state_forwards of all existing migrations against the current model state generating model states from the current model classes and comparing both would shave off a model rendering phase.

This was extracted from the meta ticket #22608 which tracks slow parts of the migration framework.

An initial and dusty stab at this can be found here https://github.com/django/django/compare/master...charettes:autodetector-model-states

Note that this has a bit of overlap with #29898 because both optimizations require a way to resolve relations efficiently from model states.

Change History (16)

comment:1 by David Wobrock, 4 years ago

Cc: David Wobrock added
Owner: changed from nobody to David Wobrock
Status: newassigned

Hi there,

I got more or less my head around what is asked in the ticket and how it would improve the autodetector. However, I have some questions and I'd like to ask for some help and expertise on this issue.

I'll just shoot my questions:

  1. Am I understanding correctly that, by using the ModelState objects, instead of the rendered model objects/classes, we avoid calling the ProjectState.apps and ProjectState.concrete_apps (in the Autodetector.init) which are rendering the models? Therefore, we avoid work and speed up the makemigrations and migrate commands?

But since the "from_state" is still generated from the migration graph, we will still apply all existing migrations iteratively to build it, right? So we cannot totally avoid exploring the migration graph?

And let's get down to some more technical business :) I started re-working the patch, it's still a work in progress, but I think most of the logic of the autodetector is migrated to using model states. You should be able to look at the patch commit-by-commit for convenience. I'll still need to work on tests, fix the existing ones and adding some new ones to ensure I introduced no regression. Please find my current progress here https://github.com/django/django/compare/master...David-Wobrock:ticket-29899?expand=1

  1. The one thing about the logic which I'm unsure of, is the "relations" property which is introduced in the ProjectState. The goal is, I guess, to substitute the "related_objects" that a model class' option contains. However, I don't fully understand how the "proxies", "concrete" models and "swappable" ones work together.

I see that we are trying to fill the relations, for each model (identified by app_label+model_name), have the related models (identified by the same key) and the associated fields that have a relation. It's not easy to get your head around the 4 consecutive loops.

  • What are those relations exactly? All possible defined FK (also m2m, ...) on the model itself? Or also fields FK that point to this model?
  • And would it possible, even quickly, to explain how "swappable", "proxy" and "concrete" work together, I'd be grateful. I mean, I guess that we need to resolve the real/concrete model from a proxy model, but "swappable" is my main source of confusion here and I don't know how it should be handled :/ Like a proxy model because we resolve the concrete model when the User model is swappable?

Thanks a lot in advance! I'm assigning myself the issue.

comment:2 by Simon Charette, 4 years ago

Hello David,

Am I understanding correctly that, by using the ModelState objects, instead of the rendered model objects/classes, we avoid calling the ProjectState.apps and ProjectState.concrete_apps (in the Autodetector.init) which are rendering the models? Therefore, we avoid work and speed up the makemigrations and migrate commands?

Exactly, accessing .apps and concrete_apps involves model rendering which is really slow.

But since the "from_state" is still generated from the migration graph, we will still apply all existing migrations iteratively to build it, right? So we cannot totally avoid exploring the migration graph?

Right but from_state is generated by calling states_forwards on all operations of on-disk migrations and none of these operations should access .apps during states_forwards which should make them relatively fast.

The one thing about the logic which I'm unsure of, is the "relations" property which is introduced in the ProjectState. The goal is, I guess, to substitute the "related_objects" that a model class' option contains.

That's it

However, I don't fully understand how the "proxies", "concrete" models and "swappable" ones work together.

You'll have to do a bit more of investigation on your side to figure this out I'm afraid but the gist is that proxies models are Meta.proxy = True models which are always backed by a concrete table backed one. swappable refers to models that can be swapped by other ones. Currently only auth.User uses this feature internally.

I see that we are trying to fill the relations, for each model (identified by app_label+model_name), have the related models (identified by the same key) and the associated fields that have a relation. It's not easy to get your head around the 4 consecutive loops.

Yeah this code is tricky for sure.

What are those relations exactly? All possible defined FK (also m2m, ...) on the model itself? Or also fields FK that point to this model?

IIRC they are all forwards relationships which means related fields from the model.

And would it possible, even quickly, to explain how "swappable", "proxy" and "concrete" work together, I'd be grateful. I mean, I guess that we need to resolve the real/concrete model from a proxy model, but "swappable" is my main source of confusion here and I don't know how it should be handled :/ Like a proxy model because we resolve the concrete model when the User model is swappable?

I tried to explain it above but I suggest you start by reading the (limited) swappable model documentation and then dive into how they are currently handled. A lot of this logic is tribal knowledge share by the limited amount of people who worked on the migration framework over the years and its hard to define beyond what's coded and currently passes the suite. The basic idea is that swappable models could have been swapped by another one (e.g. a custom user model via settings.AUTH_USER_MODEL) so it needs to be treated differently.

As mentioned in the description there's a lot of overlap with this ticket and #29898 which I tried to describe on the mailing list to a developer interested in working on the latter for GSoC. Both tickets require access to a ProjectState forwards relationship cache (and maybe a reverse one as well in this case) and both would benefit from having ProjectState define methods that perform state alterations currently baked into Operation.state_forwards overrides. That would allow the auto-detector to simply call these operations if necessary when performing renames instead of maintaining tons of different maps of changes. It would also pave the way for ProjectState/ModelState to have their own .difference(other, questionner=None) methods to break the monolithic autodetector file into testable and extendable components which users have been asking for a while.

comment:3 by David Wobrock, 4 years ago

Patch needs improvement: unset

comment:4 by Mariusz Felisiak, 4 years ago

Patch needs improvement: set

comment:5 by David Wobrock, 3 years ago

Patch needs improvement: unset

comment:6 by Mariusz Felisiak, 3 years ago

Patch needs improvement: set

comment:7 by David Wobrock, 3 years ago

Patch needs improvement: unset

comment:8 by Mariusz Felisiak, 3 years ago

Patch needs improvement: set

comment:9 by David Wobrock, 3 years ago

Patch needs improvement: unset

comment:10 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In fd325b9:

Refs #29899 -- Moved resolve_relation() to django.db.migrations.utils.

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In a6784949:

Refs #29899 -- Improved variable names in MigrationAutodetector._detect_changes().

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In aa4acc1:

Fixed #29899 -- Made autodetector use model states instead of model classes.

Thanks Simon Charette and Markus Holtermann for reviews.

comment:14 by Chris Jerdonek, 3 years ago

On this issue, does anyone remember if there was a reason that self.relations was defined to have this form--

  • {remote_model_key: {model_key: [(field_name, field)]}}

as opposed to this form?

  • {remote_model_key: {model_key: {field_name: field}}}

While reviewing PR #14587 for #29898, I noticed the code would be a bit simpler in parts if it were the latter as opposed to the former, but I wasn't sure if there was a reason for a using a list of pairs instead of a dict.

in reply to:  14 comment:15 by Mariusz Felisiak, 3 years ago

Replying to Chris Jerdonek:

On this issue, does anyone remember if there was a reason that self.relations was defined to have this form--

  • {remote_model_key: {model_key: [(field_name, field)]}}

as opposed to this form?

  • {remote_model_key: {model_key: {field_name: field}}}

While reviewing PR #14587 for #29898, I noticed the code would be a bit simpler in parts if it were the latter as opposed to the former, but I wasn't sure if there was a reason for a using a list of pairs instead of a dict.

Probably because there was no need to update the list of fields. I was also thinking about changing this structure to a dict.

comment:16 by Chris Jerdonek, 3 years ago

I was also thinking about changing this structure to a dict.

👍

Note: See TracTickets for help on using tickets.
Back to Top