Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#33483 closed Bug (wontfix)

Index name migration instability between database engines

Reported by: Marti Raudsepp Owned by: nobody
Component: Database layer (models, ORM) Version: 4.0
Severity: Normal Keywords: oracle, migrations, indexes
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Marti Raudsepp)

I have been using the django-celery-results project with Django on an Oracle database. I had issues upgrading to 2.1+ versions. (One of those issues was fixed in django-celery-results 2.2.0, the other issue remained: https://github.com/celery/django-celery-results/issues/222)

Whenever running makemigrations with the newer version, it would attempt to create a new migration file to drop all indexes and recreate them with a different name. This happens because 2.1.0 introduced some AddIndex migrations.

I tracked this down to the index name generation logic in Index.set_name_with_model(): https://github.com/django/django/blob/4.0.2/django/db/models/indexes.py#L150-L163

The index name hash depends on model._meta.db_table, which differs depending on the database.

  • PostgreSQL: hash_data=['django_celery_results_taskresult', 'task_name', 'idx'] -> index name django_cele_task_na_08aec9_idx
  • Oracle: hash_data=['django_celery_results_task5a9b', 'task_name', 'idx'] -> index name django_cele_task_na_4bcd60_idx

Looking at DatabaseOperations.max_name_length() methods, this issue potentially affects all supported engines. But it's much more likely to affect Oracle due to the historically low identifier limit of 30 characters.

I think this is a bug in Django -- generally the expectation seems to be that migration files are portable between databases.

Going forward

If the logic in Index.set_name_with_model() were to change -- for example to use un-truncated table name -- then after upgrading to that Django version, Django migrations out of the box would re-create all these indexes. For users with large databases, recreating indexes may cause significant disruption.

So some compatibility logic for legacy index names seems warranted. Although for my personal use case, I wouldn't mind a fix that requires such "flag-day" migrations.

On the other hand, it would be nice to bump Oracle's max_name_length() anyway. The current 30-character limit is already obsolete. Django 4.0 officially only supports Oracle 19c and newer. In Oracle 12.2 and above the maximum object name length is 128 bytes, not 30.

Reproducing

I reduced this down to two clean Django 4.0.2 setups, the only changes are adding INSTALLED_APPS django_celery_results, and configuring DATABASES for Oracle and PostgreSQL respectively.

By running makemigrations on each in turn, they generate new migrations every time.

% python django-celery-results-postgres/manage.py makemigrations
No changes detected
% python django-celery-results-oracle/manage.py makemigrations
Migrations for 'django_celery_results':
  .../django-celery-results/django_celery_results/migrations/0011_remove_groupresult_django_cele_date_cr_bd6c1d_idx_and_more.py
    - Remove index django_cele_date_cr_bd6c1d_idx from groupresult
    - Remove index django_cele_date_do_caae0e_idx from groupresult
    - Remove index django_cele_task_na_08aec9_idx from taskresult
    - Remove index django_cele_status_9b6201_idx from taskresult
    - Remove index django_cele_worker_d54dd8_idx from taskresult
    - Remove index django_cele_date_cr_f04a50_idx from taskresult
    - Remove index django_cele_date_do_f59aad_idx from taskresult
    - Create index django_cele_date_cr_0d69cb_idx on field(s) date_created of model groupresult
    - Create index django_cele_date_do_030540_idx on field(s) date_done of model groupresult
    - Create index django_cele_task_na_4bcd60_idx on field(s) task_name of model taskresult
    - Create index django_cele_status_9c2623_idx on field(s) status of model taskresult
    - Create index django_cele_worker_d87a3d_idx on field(s) worker of model taskresult
    - Create index django_cele_date_cr_2b7193_idx on field(s) date_created of model taskresult
    - Create index django_cele_date_do_c4aa5e_idx on field(s) date_done of model taskresult
% python django-celery-results-postgres/manage.py makemigrations
Migrations for 'django_celery_results':
  .../django-celery-results/django_celery_results/migrations/0012_remove_groupresult_django_cele_date_cr_0d69cb_idx_and_more.py
    - Remove index django_cele_date_cr_0d69cb_idx from groupresult
[...]
    - Create index django_cele_date_do_f59aad_idx on field(s) date_done of model taskresult
% python django-celery-results-oracle/manage.py makemigrations
Migrations for 'django_celery_results':
  .../django-celery-results/django_celery_results/migrations/0013_remove_groupresult_django_cele_date_cr_bd6c1d_idx_and_more.py
    - Remove index django_cele_date_cr_bd6c1d_idx from groupresult
[...]
    - Create index django_cele_date_do_c4aa5e_idx on field(s) date_done of model taskresult

Change History (7)

comment:1 by Marti Raudsepp, 3 years ago

Description: modified (diff)

comment:2 by Marti Raudsepp, 3 years ago

Description: modified (diff)

comment:3 by Marti Raudsepp, 3 years ago

There is a simple work-around for this issue: explicitly specify index names in models, e.g.

class Foo(models.Model):
    ...
    class Meta:
        indexes = [
            models.Index(fields=['date_created'], name='django_cele_date_cr_bd6c1d_idx'),
        ]

Opened a pull request for django-celery-results: https://github.com/celery/django-celery-results/pull/283/files

comment:4 by Marti Raudsepp, 3 years ago

Description: modified (diff)

in reply to:  description comment:5 by Mariusz Felisiak, 3 years ago

Resolution: wontfix
Status: newclosed

Thanks for the report.

I think this is a bug in Django -- generally the expectation seems to be that migration files are portable between databases.

Unfortunately, database backends (even builtin) will never be fully swappable.

On the other hand, it would be nice to bump Oracle's max_name_length() anyway. The current 30-character limit is already obsolete. Django 4.0 officially only supports Oracle 19c and newer. In Oracle 12.2 and above the maximum object name length is 128 bytes, not 30.

I think it is not a feasible option, see #30684. Moreover this can be changed via COMPATIBLE parameter.

If the logic in Index.set_name_with_model() were to change -- for example to use un-truncated table name -- then after upgrading to that Django version, Django migrations out of the box would re-create all these indexes. For users with large databases, recreating indexes may cause significant disruption.

So some compatibility logic for legacy index names seems warranted. Although for my personal use case, I wouldn't mind a fix that requires such "flag-day" migrations.

IMO it's not doable, because we don't have a clean and backward compatible path, basically from the same reason as #30684. I would recommend adding db_table (as you did for django-celery-results) or using a backend subclass to override max_name_length() in your own project.

I really would like to unify this, however we cannot do this without a backward compatible solution.

comment:6 by Marti Raudsepp, 3 years ago

I really would like to unify this, however we cannot do this without a backward compatible solution.

I agree, backwards compatibility is important. Isn't this the right place to come up with options how to do it in a backward-compatible manner?

But you seem to be shutting down the path forward with "it's difficult, therefore we won't fix it".

Unfortunately, database backends (even builtin) will never be fully swappable.

I didn't say "fully swappable", I recognize there are engine-specific features.

But for models that use bare minimum ORM features (tables, columns, indexes), isn't that something to strive for? There are so many Django libraries and add-ons that rely on this assumption. This is one of the most valuable parts of the Django ecosystem. Throwing it out is not something the Django project should do lightly.

Otherwise, the fact that migrations aren't portable between engines should be clearly documented, which it is not as far as I can tell.

comment:7 by Mariusz Felisiak, 3 years ago

Isn't this the right place to come up with options how to do it in a backward-compatible manner?

I cannot imagine any way to achieve this, so if you have any proposition I'm all ears.

Note: See TracTickets for help on using tickets.
Back to Top