Opened 15 months ago
Last modified 2 months ago
#34898 assigned Bug
Adding non-deterministic collations to unique CharFields crashes on PostgreSQL.
Reported by: | Mariusz Felisiak | Owned by: | Tom Carrick |
---|---|---|---|
Component: | Migrations | Version: | 4.2 |
Severity: | Normal | Keywords: | PostgreSQL collation |
Cc: | Jacob Walls, Peter Thomassen, Ülgen Sarıkavak, Yiwei Gao | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Adding non-deterministic collations to unique CharField
s crashes on PostgreSQL. A regression test:
-
tests/schema/tests.py
diff --git a/tests/schema/tests.py b/tests/schema/tests.py index 68b6442794..ba8d9e99ec 100644
a b class SchemaTests(TransactionTestCase): 1402 1402 ) 1403 1403 self.assertIn("field", self.get_uniques(CiCharModel._meta.db_table)) 1404 1404 1405 @unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific") 1406 @skipUnlessDBFeature( 1407 "supports_collation_on_charfield", 1408 "supports_non_deterministic_collations", 1409 ) 1410 def test_add_collation_to_unique_charfield(self): 1411 ci_collation = "case_insensitive" 1412 1413 def drop_collation(): 1414 with connection.cursor() as cursor: 1415 cursor.execute(f"DROP COLLATION IF EXISTS {ci_collation}") 1416 1417 with connection.cursor() as cursor: 1418 cursor.execute( 1419 f"CREATE COLLATION IF NOT EXISTS {ci_collation} (provider = icu, " 1420 f"locale = 'und-u-ks-level2', deterministic = false)" 1421 ) 1422 self.addCleanup(drop_collation) 1423 1424 # Create the table. 1425 with connection.schema_editor() as editor: 1426 editor.create_model(AuthorWithUniqueName) 1427 # Add db_collation. 1428 old_field = AuthorWithUniqueName._meta.get_field("name") 1429 new_field = CharField(max_length=255, unique=True, db_collation=ci_collation) 1430 new_field.model = AuthorWithUniqueName 1431 new_field.set_attributes_from_name("name") 1432 with connection.schema_editor() as editor: 1433 editor.alter_field( 1434 AuthorWithUniqueName, old_field, new_field, strict=True 1435 ) 1436 1437 self.assertEqual( 1438 self.get_column_collation(AuthorWithUniqueName._meta.db_table, "field"), 1439 ci_collation, 1440 ) 1441 self.assertIn("field", self.get_uniques(AuthorWithUniqueName._meta.db_table)) 1442 1405 1443 @isolate_apps("schema") 1406 1444 @unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific") 1407 1445 @skipUnlessDBFeature(
Change History (12)
comment:1 by , 14 months ago
comment:2 by , 14 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 4 comment:3 by , 14 months ago
I looked into it a bit.
I think handling this in the schema
module is a really bad idea, as we would be dropping and creating an index or constraint without the user's knowledge. I don't see another way to handle it.
The next option is to detect this when running makemigrations
. I know very little about how makemigrations
works so I may be totally wrong here but it seems like at the moment it doesn't really know anything about the database state. If we were to fix this bug there, we'd have to add a query to determine if the collation is deterministic. It would also need to have some knowledge of Postgres that seems not to be known anywhere else in this part of the code.
So I again come back to this: I think we should document this and not try to fix it in code, and perhaps also further discourage use of unique=True
and db_index=True
.
comment:4 by , 13 months ago
Replying to Tom Carrick:
I think handling this in the
schema
module is a really bad idea, as we would be dropping and creating an index or constraint without the user's knowledge. I don't see another way to handle it.
I don't think it's that bad. We've already don't create varchar_pattern_ops
/text_pattern_ops
indexes when non-deterministic db_collation
is set, so removing them before adding a collation would be somehow consistent with the current behavior.
follow-up: 6 comment:5 by , 13 months ago
I agree that it's consistent. What I don't think is a good idea though, is dropping and recreating the entire index. This could take a fairly long time on some databases, and it can be quite surprising that a change to the collation would cause another operation that takes some time.
I can have a look though. Maybe we can at least do it concurrently.
comment:6 by , 13 months ago
Replying to Tom Carrick:
I agree that it's consistent. What I don't think is a good idea though, is dropping and recreating the entire index. This could take a fairly long time on some databases, and it can be quite surprising that a change to the collation would cause another operation that takes some time.
I can have a look though. Maybe we can at least do it concurrently.
Can we recreate them? For now, we completely skip varchar_pattern_ops
/text_pattern_ops
indexes when non-deterministic db_collation
is set. Is it not enough to remove them before adding a collation?
comment:7 by , 13 months ago
The opposite case works fine because there's no index yet. The user is adding an index, so it's expected that the index gets created, and of course we need it without varchar_pattern_ops
.
In this case though, the user already has a varchar_pattern_ops
index. Postgres offers no way to alter an existing index, it's drop and recreate only, so we need to drop the index, change the collation, then add the index again but without varchar_patterm_ops
. My concern here is not that it's difficult, it's that we're performing potentially expensive operations without informing the user about them (as they don't show up in the migration file, or anywhere else). Ideally the user should know they can't use varchar_pattern_ops
with a non-deterministic collation, but db_index
and unique
are quite opaque about what is actually happening, relative to the more explicit Meta.indexes
and Meta.constraints
, so it's hard for them to know (unless it's very explicitly documented somewhere) what the consequences are of changing the collation in this scenario.
comment:8 by , 13 months ago
Ah, right. TBH, the current NotSupportedError: nondeterministic collations are not supported for operator class "varchar_pattern_ops"
error is not so bad. Maybe it's enough to add a warning in docs (as you suggested).
comment:9 by , 8 months ago
Cc: | added |
---|
comment:10 by , 4 months ago
Cc: | added |
---|
This is a regression caused by the commit which removes django.contrib.postgres.fields.CIText/CICharField/CIEmailField/CITextField (04eb1b4567c96ccb167c16a95ca12c336b0c791b).
Using the code immediately before that commit (i.e., using 6e4e5523a8f40b63f3e74889266a7d638f6364dc), the ALTER statement setting the collation is preceded by DROP INDEX IF EXISTS "desecapi_user_email_fa6ced42_like"
.
When applying the next commit (04eb1b4567c96ccb167c16a95ca12c336b0c791b), this DROP INDEX
statement is gone, and the ALTER statement then causes nondeterministic collations are not supported for operator class "varchar_pattern_ops"
.
We used to have a unique CIEmailField. When that was deprecated, we created a migration that turns this into a standard EmailField, but a case-insensitive collation; the uniqueness properties were not changed. See: https://github.com/desec-io/desec-stack/blob/41ee90eecb4f54d141956b33066ae9ef69442120/api/desecapi/migrations/0031_alter_user_email.py#L24
As a result, it's not possible to initialize the database when running on Django 5.1.
So, I don't think this is merely a "don't do this, it's not supported" case; it is in fact a regression from 5.0 to 5.1.
How do you advise to proceed, both with this bug generally, and with our Django upgrade in particular? Thanks!
comment:11 by , 3 months ago
Cc: | added |
---|
comment:12 by , 2 months ago
Cc: | added |
---|
Interesting. I looked into it and it is indeed only this case that breaks.
If you add them both at once, it's fine. If you add the collation, then the unique, it's fine. But if it's unique, then you try to add the collation, boom.
As you might expect, this also happens for
db_index
.However, it doesn't happen when using
Meta
indexes
/constraints
unless you haveopclasses=["varchar_pattern_ops"]
, and if you remove this in the same migration as adding the collation, this also seems to work fine (I'm not sure if by luck of because Django first removes the constrain, then does other stuff, then adds the constraint purposefully).Given that
db_index
is discouraged in the docs... and I'd make the same argument for doing the same forunique=True
, maybe a note in the docs about this is enough?I didn't look very hard to find a solution, but I think it should be possible by adding stuff to
django.db.backends.postgresql.schema.DatabaseSchemaEditor._alter_field
. I'll take a look.