Opened 5 years ago

Closed 11 months ago

Last modified 7 months ago

#30490 closed Bug (wontfix)

migrations unique_index on (app, name).

Reported by: Richard Kojedzinszky Owned by: nobody
Component: Uncategorized Version: dev
Severity: Normal Keywords: migrations parallel run
Cc: Adam Johnson Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've a django based api service. I run it in containers, in kubernetes. When I update my image, basically I run the migiration process, then start the application. With k8s for example, when I start the deployment with 2 or more replicas, actually 2 parallel running migration process will start. With the following very simple migration process, unfortunately the migration will be applied twice:

from django.db import migrations
from django.db.models import F
import time

def forward(apps, schema_editor):
    TT = apps.get_model('center', 'TestTable')
    time.sleep(10)
    TT.objects.all().update(value=F('value') + 1)

def backward(apps, schema_editor):
    TT = apps.get_model('center', 'TestTable')
    TT.objects.all().update(value=F('value') - 1)


class Migration(migrations.Migration):

    dependencies = [
        ('center', '0007_testtable'),
    ]

    operations = [
            migrations.RunPython(forward, backward)
    ]

Even if the migration process is ran in a transaction, due to the unique index missing, it will be applied multiple times.

Even though my setup may be rare, I think it would really be necessary to prevent migrations to be applied more than once. Db unique_together could be utilized for that.

Change History (12)

comment:1 by Mariusz Felisiak, 5 years ago

Resolution: wontfix
Status: newclosed
Summary: migrations unique_index on (app, name)migrations unique_index on (app, name).
Version: 1.11master

Thanks for the report, however it is rather a support issue, IMO. Adding unique on (app, name) will not fix your issue, because Django adds entry to the table with migrations history outside the main transaction.

First of all I would use select_for_update() to lock table before update (to prevent race condition). Secondly you should run migrations (on the same database) only on one instance.

Closing per TicketClosingReasons/UseSupportChannels.

comment:2 by Richard Kojedzinszky, 5 years ago

That is why I wrote, that my approach might not be the best practice. With an automated deployments, I would really like the case when Django itself would prevent applying the migrations multiple times. That would even help preventing errors made by humans. You are right, the whole migration step should be in one database transaction for this to work. And as it is now, I rather think it is a bug.

in reply to:  description ; comment:3 by Richard Kojedzinszky, 5 years ago

Resolution: wontfix
Status: closednew

Unfortunately, knowing that the migration tasks and recording its existence is not run in the same transaction, in a worst case scenario, the info that a migration has been applied can be lost at all. So, in any circumstances, they should be in the same transaction.

Replying to Richard Kojedzinszky:

I've a django based api service. I run it in containers, in kubernetes. When I update my image, basically I run the migiration process, then start the application. With k8s for example, when I start the deployment with 2 or more replicas, actually 2 parallel running migration process will start. With the following very simple migration process, unfortunately the migration will be applied twice:

from django.db import migrations
from django.db.models import F
import time

def forward(apps, schema_editor):
    TT = apps.get_model('center', 'TestTable')
    time.sleep(10)
    TT.objects.all().update(value=F('value') + 1)

def backward(apps, schema_editor):
    TT = apps.get_model('center', 'TestTable')
    TT.objects.all().update(value=F('value') - 1)


class Migration(migrations.Migration):

    dependencies = [
        ('center', '0007_testtable'),
    ]

    operations = [
            migrations.RunPython(forward, backward)
    ]

Even if the migration process is ran in a transaction, due to the unique index missing, it will be applied multiple times.

Even though my setup may be rare, I think it would really be necessary to prevent migrations to be applied more than once. Db unique_together could be utilized for that.

in reply to:  3 comment:4 by Richard Kojedzinszky, 5 years ago

Perhaps, a similar patch would be a good starting point:

https://github.com/rkojedzinszky/django/commit/12477598fb3711f50ff482328ddcda62934caee6

Replying to Richard Kojedzinszky:

Unfortunately, knowing that the migration tasks and recording its existence is not run in the same transaction, in a worst case scenario, the info that a migration has been applied can be lost at all. So, in any circumstances, they should be in the same transaction.

Replying to Richard Kojedzinszky:

I've a django based api service. I run it in containers, in kubernetes. When I update my image, basically I run the migiration process, then start the application. With k8s for example, when I start the deployment with 2 or more replicas, actually 2 parallel running migration process will start. With the following very simple migration process, unfortunately the migration will be applied twice:

from django.db import migrations
from django.db.models import F
import time

def forward(apps, schema_editor):
    TT = apps.get_model('center', 'TestTable')
    time.sleep(10)
    TT.objects.all().update(value=F('value') + 1)

def backward(apps, schema_editor):
    TT = apps.get_model('center', 'TestTable')
    TT.objects.all().update(value=F('value') - 1)


class Migration(migrations.Migration):

    dependencies = [
        ('center', '0007_testtable'),
    ]

    operations = [
            migrations.RunPython(forward, backward)
    ]

Even if the migration process is ran in a transaction, due to the unique index missing, it will be applied multiple times.

Even though my setup may be rare, I think it would really be necessary to prevent migrations to be applied more than once. Db unique_together could be utilized for that.

comment:5 by Simon Charette, 5 years ago

Resolution: wontfix
Status: newclosed

Hey Richard, I understand your desire to get this issue fixed but it's more complex than wrapping apply_migration and unapply_migration in a transaction.

For example, some backends don't support performing schema alteration within a transaction (e.g. MySQL will silently commit the current transaction on such operation) and some migrations are explicitly marked as non-atomic to prevent long running transactions (e.g. large back-filling data migrations).

For these reasons holding a database level lock for the duration of a migration application is probably not a feasible option. Altering the django_migrations table is also problematic given it's not tracked by migrations itself which is another kind of worms.

In your particular case I'd suggest you explore using another form of distributed lock (e.g. Redis) namespaced by your deploy hash to prevent concurrent migration but you're probably better off getting support from our support channels of k8s ones where other users might have already come up with a solution to your problem.

For these reasons I agree with Mariusz' resolution and ask that you follow triaging guidelines with regards to wontfix tickets. Thanks.

comment:6 by Adam Johnson, 5 years ago

You don't need to use Redis for distributed locking, you can use advisory locking in your database server. They can live outside of transactions.

In PostgreSQL they are called advisory locks - https://hashrocket.com/blog/posts/advisory-locks-in-postgres . They are by an integer key.

In MySQL/MariaDB they are called user locks - https://mariadb.com/kb/en/library/get_lock/ . They are by a string key.

I don't know about SQLite or Oracle. Quick internet searches don't reveal anything

Potentially Django could use these advisory locks.

comment:7 by Adam Johnson, 5 years ago

Cc: Adam Johnson added

comment:8 by Tom Forbes, 5 years ago

The way we solved this with Kubernetes is to create and launch a Job which will run manage.py migrate as part of the deployment pipeline. Once that Job is complete, update the Deployment/ReplicaSet as needed.

As always you have to be completely sure the migration can run concurrently with your app, but in genera running migrations as an initpod or as part of the web app pod is really not advised, and adding a hack like a distributed lock will only bring you pain.

comment:9 by Richard Kojedzinszky, 11 months ago

Resolution: wontfix
Status: closednew

@simon As the migration script and recording it now runs in the same transaction, the only thing left here is to add the unique key on the records table. Am I right?

comment:10 by Mariusz Felisiak, 11 months ago

Resolution: wontfix
Status: newclosed

I appreciate you'd like to reopen the ticket, but please follow the triaging guidelines with regards to wontfix tickets and don't reopen closed tickets before reaching a consensus on the forum.

#34610 was a duplicate.

comment:12 by Natalia Bidart, 7 months ago

#34895 is a dupe of this one as well.

Note: See TracTickets for help on using tickets.
Back to Top