Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#27200 closed Cleanup/optimization (fixed)

Provide makemigrations router.allow_migrate() with a model_name

Reported by: Joseph Kahn Owned by: nobody
Component: Migrations Version: 1.10
Severity: Normal Keywords: makemigrations, router, allow_migrate
Cc: Shai Berger Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I maintain a package called https://github.com/JBKahn/django-sharding and I don't really understand what the expected behaviour is when an app calls allow_migrate(db, app_label). What is it asking?

If I've got data on more than one database and Django cannot tell me the name of the model or the instance, then what am I expected to respond with? I made model_name or a hint about the model a hard requirement in the router since you cannot reason about the result without the context of a model. Two models in the same app could be on a different database. Recently, in Django 1.10, this call was added to makemigrations. I'm not sure what to return or how to handle this case/force a hint in the event of a runython call but not a makemigrations call.

More so, I'm not sure how one is expected to be able to write a router to handle this?

Change History (15)

comment:1 Changed 3 years ago by Joseph Kahn

Summary: rouer.allow_migrate without a model/hint, what is it supposed to do? Does it make sense to make that call?router.allow_migrate without a model/hint, what is it supposed to do? Does it make sense to make that call?

comment:2 Changed 3 years ago by Tim Graham

Have you consulted the documentation? I guess your router could return None if it requires a model_name but doesn't get one.

I'm not sure this is the right place to ask what seems to be a usage question (see TicketClosingReasons/UseSupportChannels) but I'll leave it open and maybe you can propose some documentation clarification or explain your problem in a bit more detail and offer a possible solution.

comment:3 in reply to:  2 Changed 3 years ago by Joseph Kahn

I guess my issue is that I'm not sure what the intent of allow_migrate(db, app_label) is supposed to mean.

I can't answer that question in a 1+ database world unless I shard my data by application. If that's the case, the router documentation should be updated to say that sharding within an application is not supported via the django codebase.

And so what I'm opening this ticket to address is, why is the interface allow_migrate(self, db, app_label, model_name=None, **hints) and not allow_migrate(self, db, app_label, model_name, **hints). Given django is (as far as I can tell from the docs) intending to support multi-db cases.

Replying to timgraham:

Have you consulted the documentation? I guess your router could return None if it requires a model_name but doesn't get one.

I'm not sure this is the right place to ask what seems to be a usage question (see TicketClosingReasons/UseSupportChannels) but I'll leave it open and maybe you can propose some documentation clarification or explain your problem in a bit more detail and offer a possible solution.

comment:4 Changed 3 years ago by Tim Graham

comment:5 in reply to:  4 Changed 3 years ago by Joseph Kahn

Replying to timgraham:

Here are the relevant commits: 098c07a03286b5ed133102733e67a83061647ea0 and bed504d70bede3431a213203c13a33905d6dbf77.

Exactly, supporting that makes sense from a usability perspective and it allows me to write data migrations and a router to work with whatever I want to use. What I don't understand is that when you choose not to pass a model, what is the point of the question allow-migrate(db, app_label), given that app_label is only relevant for the simple multi_db case of sharding applications?

If the intent is to only support sharding via app then a simple update to the docs will do. I just want to make sure that's the intention. If it's not, then I'd want to figure out a new constraint (open uop a dialogue about it) if providing a required model name is not an option.

comment:6 Changed 3 years ago by Tim Graham

Cc: Shai Berger added

In the case of 098c07a03286b5ed133102733e67a83061647ea0, I'd expect allow_migrate(db, app_label) to return True if any models in app_label can be migrated on the database.

comment:7 Changed 3 years ago by Shai Berger

c93ac9cf42bff259ab71b70a89b693b9c38e4666 was an attempt to clarify a little the relationship between makemigrations, migrate and allow_migrate(). If you can suggest how and where to add better explanations, that could help. If you think there's a real problem here, requiring more than a documentation fix, please explain that.

comment:8 in reply to:  6 ; Changed 3 years ago by Joseph Kahn

Replying to timgraham:

In the case of 098c07a03286b5ed133102733e67a83061647ea0, I'd expect allow_migrate(db, app_label) to return True if any models in app_label can be migrated on the database.

My understanding is that allow_migrate(db, app_label) is equiavlent to any(allow_migrate(db, app_label, model_name) for model_name in get_model_names(app_label)), where get_model_names is a generator for retrieving all the names of all the models inside an app.

What I'm still a little confused about is:
1) Why not keep the strict requirement and write the code like the above loop? Requiring each statement to be for one model?
note: I can easily make that my implementation inside allow_migrate so it's not that this isn't doable.

2) Given we are not constrained to a single model, aside from not having the for loop above, the other use case I cna think of is for those RunPython statements which when generated are always on a model level. Which leads me to my next question of how are Data Migrations meant to be written? Is a single RunPython statement supposed to address only changes in 1 model, only changes in models that are defined in that app? The docs suggest otherwise, that dependencies are used to allow me access to other models.

The way I reasoned about each data migration was something like:

current_db = self....
this_model_name.objects.using(current_db)...

If it runs on every database that the app has a model in, then I need to change tha tot something like:

current_db = self....
if this_model_name on current_db:
    this_model_name.objects.using(current_db)...

So far that seems reasonable. But if we don't have a concept of the model name we're likely to have:

current_db = self....
for model in ["some list of models I want to update"]:
    if this_model on current_db: 
        this_model.objects.using(current_db)...

Which still seems reasonable.

However, there is nothing in the context of a migration to prevent or guide me away from saying:
Model A1 is in app A and model B is in app B. When I update A1, I need to update the corresponding things in B1. When everything is on one database this makes sense to do as a data migration, is the idea that if I need to update both together they cannot be on one data migration IF they are on different shards? If they should be done on a data migration than in the context of looping over one database in the migration, I'm accessing some n other databases for all the values of B that need to update. I would assume that you are not supposed to write code like that, where a migration on one database has some effect on another one?

I'm just struggling a little to understand the intention here in multi-db scenarios where an update on one db requires another update on another database. If the Django team has a clear intent then it just needs a more definite explanation in the docs (in my opinion).

edit: I'm in the middle of some stuff right now so sorry if this feels a bit scattered. Since I'm allowed to access models in other apps inside a data migration, what if those models don't have complete database overlap. Or is the intention that if that's the case I'm supposed to only read the other model but not write to it, since allow_migrate is this context is in an app_label context and should only loop over some of those databases? I'm not sure if this was super clear so let me know if I need to rephrase that.

Last edited 3 years ago by Joseph Kahn (previous) (diff)

comment:9 Changed 3 years ago by Tim Graham

I didn't think through multi-database scenarios when I added the allow_migrate() call without model_name. I looked at the allow_migrate() signature and saw that model_name wasn't required, so I assumed routers should be able to handle the case where it's not passed. I guess you can add the any(...) condition that you mentioned, and I'll keep this ticket in mind if a similar concern happens again.

About your second question, I'm not sure if this is related to the first question but it's too involved for me to answer off the top of my head. You might have more luck on our support channels. I'm not sure what actionable items could result from accepting this ticket, so unless anyone sees a way forward, perhaps it can be closed.

comment:10 in reply to:  8 Changed 3 years ago by Shai Berger

Hi Joseph, sorry for taking so long to respond. I'll answer your questions in reverse order, starting with the second:

Replying to Joseph Kahn:

However, there is nothing in the context of a migration to prevent or guide me away from saying:
Model A1 is in app A and model B is in app B. When I update A1, I need to update the corresponding things in B1. When everything is on one database this makes sense to do as a data migration, is the idea that if I need to update both together they cannot be on one data migration IF they are on different shards?

This has always been the case, and it's not because of a Django design decision, but because if some changes need to be done together, then they need to be done in a (database) transaction, and that is a little hard to do when they are on separate databases.

If they should be done on a data migration than in the context of looping over one database in the migration, I'm accessing some n other databases for all the values of B that need to update. I would assume that you are not supposed to write code like that, where a migration on one database has some effect on another one?

I'm just struggling a little to understand the intention here in multi-db scenarios where an update on one db requires another update on another database. If the Django team has a clear intent then it just needs a more definite explanation in the docs (in my opinion).

Again -- you describe a case where database integrity is broken by design. I believe the developers who made the design decisions here did not give such cases a high priority and probably didn't completely think them through, and I think Django should not encourage such usage, so I would be against making recommendations about it in the docs.

edit: I'm in the middle of some stuff right now so sorry if this feels a bit scattered. Since I'm allowed to access models in other apps inside a data migration, what if those models don't have complete database overlap. Or is the intention that if that's the case I'm supposed to only read the other model but not write to it, since allow_migrate is this context is in an app_label context and should only loop over some of those databases? I'm not sure if this was super clear so let me know if I need to rephrase that.

I'm not completely sure I got your meaning, but if I did: When migrations are run against a given database, we don't expect changes to be made to other databases. These other databases can still be read, as the migrations can always use data sources other than the migrated database (trivially project source files, but also data files, Web API calls, whatever).

Now, back to the first question:

My understanding is that allow_migrate(db, app_label) is equiavlent to any(allow_migrate(db, app_label, model_name) for model_name in get_model_names(app_label)), where get_model_names is a generator for retrieving all the names of all the models inside an app.

What I'm still a little confused about is:
1) Why not keep the strict requirement and write the code like the above loop? Requiring each statement to be for one model?
note: I can easily make that my implementation inside allow_migrate so it's not that this isn't doable.

I think the current API is a better reflection of the decisions needed to be made, and so gives better hints to the authors of routers.

As an example: Suppose I have an app which has several models, but only one of them, say ModelA, is stored on database dbA. If the API requires a model name as you suggest, the natural thing for me to do is something like

    def allow_migrate(self, db, app_label, model_name):
        if app_label==my_app_label:
            return ((db=='dbA') == (model_name=='modela'))
        else:
            return None

I might be surprised to find out that this allows all RunPython and RunSQL operations in migrations in the app on dbA, which is a consequence of your suggestion. On the other hand, the current API tells me explicitly: There are cases when you need to decide without reference to a specific model, where the closest general hint we can give the router is the app label.

On second thought, though, it may be worth reconsidering if we want the reply for the makemigrations case and the reply for the RunSQL case to always be the same. In the specific example I just gave, we probably don't, and it would be served better if makemigrations implemented the loop instead of calling the no-model version, or at least, gave a hint that this call is asking about the need to run a consistency check, and not about the need to change the database.

I guess I'm replying to Tim's question about going forward with "we should probably be adding a hint to the call in makemigrations". I'm not sure if we should backport that fix to 1.10.x, but I think that is an option too, since this would be a fix to a new feature.

comment:11 Changed 3 years ago by Tim Graham

Has patch: set
Summary: router.allow_migrate without a model/hint, what is it supposed to do? Does it make sense to make that call?Provide makemigrations router.allow_migrate() with a model_name
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

comment:12 Changed 3 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In cd09524f:

Fixed #27200 -- Provided makemigration's allow_migrate() with model_name.

comment:13 Changed 3 years ago by Tim Graham <timograham@…>

In 91cc5fd:

[1.10.x] Fixed #27200 -- Provided makemigration's allow_migrate() with model_name.

Backport of cd09524f27b83c0ca9dabafa81265e8d8abd252a from master

comment:14 Changed 3 years ago by Joseph Kahn

Resolution: fixed
Status: closednew

There's a bug in the code. I was on vacation, sorry it took so long for me to get back. It was getting all the models rather than the models for each app. It broke the router I use because it was passing invalid combinations for shards since not all shards have the same models.

https://github.com/django/django/pull/7530

comment:19 Changed 3 years ago by Tim Graham

Resolution: fixed
Status: newclosed

I created a new ticket: #27461.

Note: See TracTickets for help on using tickets.
Back to Top