#35194 closed Bug (needsinfo)
Postgres 16.2 with _iexact leads to IndeterminateCollation
| Reported by: | Aldalen | Owned by: | nobody | 
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 5.0 | 
| Severity: | Normal | Keywords: | |
| Cc: | Simon Charette, David Sanders | Triage Stage: | Unreviewed | 
| Has patch: | no | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description (last modified by )
There is something up with how "_iexact" GeneratedField expressions are turned into SQL for postgres 16.2.  This does not occur with 16.1.  This leads to a "psycopg2.errors.IndeterminateCollation: could not determine which collation to use for upper() function" error.   This was caught in our CI/CD env when I revved it to 16.2.  Existing / old migrations failed.   This is a reduced examples from our code, with exception paths redacted where necessary.
Model Definition: 
class TestModel(models.Model):
     test = models.CharField(max_length=254, blank=True)
     test_gen = models.GeneratedField(expression=models.Q(test__iexact='yes'), output_field=models.BooleanField(), db_persist=True)
Migration content:
migrations.AddField(
            model_name='testmodel',
            name='test_gen',
            field=models.GeneratedField(db_persist=True, expression=models.Q(("test__iexact",'yes')), output_field=models.BooleanField()),
        )
Error: 
  Applying module.migration_name...Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 103, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.IndeterminateCollation: could not determine which collation to use for upper() function
HINT:  Use the COLLATE clause to set the collation explicitly.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/builds/[redacted]/manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 107, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
                         ^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/operations/fields.py", line 108, in database_forwards
    schema_editor.add_field(
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 750, in add_field
    self.execute(sql, params)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/postgresql/schema.py", line 48, in execute
    return super().execute(sql, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 201, in execute
    cursor.execute(sql, params)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 122, in execute
    return super().execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 103, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.ProgrammingError: could not determine which collation to use for upper() function
HINT:  Use the COLLATE clause to set the collation explicitly.
      Change History (14)
comment:2 by , 21 months ago
| Description: | modified (diff) | 
|---|
comment:3 by , 21 months ago
| Component: | contrib.postgres → Database layer (models, ORM) | 
|---|---|
| Owner: | set to | 
| Resolution: | → needsinfo | 
| Status: | new → closed | 
| Summary: | Postgres 16.2 with _iexact leads to psycopg2.errors.IndeterminateCollation → Postgres 16.2 with _iexact leads to IndeterminateCollation | 
Thanks for the report, this looks like a bug in PostgreSQL. I couldn't find anything related with this change in PostgreSQL 16.2 release notes. I've checked a few ways to define that kind of expression in the GeneratedField and it seems to be related to calling UPPER() on strings:
models.GeneratedField(expression=models.Q(test__iexact='yes'), output_field=models.BooleanField(), db_persist=True)crashes 💥- SQL:
"test_gen" boolean GENERATED ALWAYS AS (UPPER("test"::text) = UPPER('yes')) STORED
 
- SQL:
 models.GeneratedField(expression=lookups.Exact(Upper("test"), "YES"), output_field=models.BooleanField(), db_persist=True)works ✅- SQL: 
"test_gen" boolean GENERATED ALWAYS AS (UPPER("test") = ('YES')) STORED
 
- SQL: 
 
Removing ::text doesn't change anything, it also crashes 💥:
"test_gen" boolean GENERATED ALWAYS AS (UPPER("test") = UPPER('yes')) STORED
Please reopen the ticket if you provide details that this is an intentional behavior change in PostgreSQL.
comment:4 by , 19 months ago
| Cc: | added | 
|---|---|
| Severity: | Normal → Release blocker | 
| Triage Stage: | Unreviewed → Accepted | 
Re-opening as a release blocker for 5.0 at it's a bug in a new feature (GeneratedField) after some sleuthing on #35368.
The culprit is this change Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 released on 2024-02-08.
Fix function volatility checking for GENERATED and DEFAULT expressions (Tom Lane)
These places could fail to detect insertion of a volatile function default-argument expression, or decide that a polymorphic function is volatile although it is actually immutable on the datatype of interest. This could lead to improperly rejecting or accepting a GENERATED clause, or to mistakenly applying the constant-default-value optimization in ALTER TABLE ADD COLUMN.
In essence the problem seems similar to #34955 as UPPER can return different value depending on the collation and thus is not immutable per-se?
I've tried to come up with a workaround but I'm not sure what should be done. The following doesn't work either
UPPER("text" COLLATE "C") COLLATE "C" = UPPER('test' COLLATE "C") COLLATE "C"
so it's possible there might be a bug on the Postgres side as well? In all cases keeping this ticket open should bring visibility to the issue.
A work around in the mean time is likely to use explicit collations
comment:5 by , 19 months ago
| Resolution: | needsinfo | 
|---|---|
| Status: | closed → new | 
comment:8 by , 19 months ago
One option I have found (could be a bad idea) is to revert some of #3575. This was an optimisation where ILIKE was removed in preference of using UPPER(field) LIKE UPPER('blah'). 
If we use ILIKE I no longer get an error here. I guess the question is, the change of #3575 was implemented many years ago and the performance of Postgres in this case may have moved on. 
Looks a bit like:
- 
      
django/db/backends/postgresql/base.py
diff --git a/django/db/backends/postgresql/base.py b/django/db/backends/postgresql/base.py index e97ab6aa89..4e3f7b3658 100644
a b class DatabaseWrapper(BaseDatabaseWrapper): 154 154 "exact": "= %s", 155 155 "iexact": "= UPPER(%s)", 156 156 "contains": "LIKE %s", 157 "icontains": " LIKE UPPER(%s)",157 "icontains": "ILIKE %s", 158 158 "regex": "~ %s", 159 159 "iregex": "~* %s", 160 160 "gt": "> %s", … … class DatabaseWrapper(BaseDatabaseWrapper): 163 163 "lte": "<= %s", 164 164 "startswith": "LIKE %s", 165 165 "endswith": "LIKE %s", 166 "istartswith": " LIKE UPPER(%s)",167 "iendswith": " LIKE UPPER(%s)",166 "istartswith": "ILIKE %s", 167 "iendswith": "ILIKE %s", 168 168 } 169 169 170 170 # The patterns below are used to generate SQL pattern lookup clauses when  - 
      
django/db/backends/postgresql/operations.py
diff --git a/django/db/backends/postgresql/operations.py b/django/db/backends/postgresql/operations.py index 4b179ca83f..af2463b1d6 100644
a b class DatabaseOperations(BaseDatabaseOperations): 172 172 else: 173 173 lookup = "%s::text" 174 174 175 # Use UPPER(x) for case-insensitive lookups; it's faster.176 if lookup_type in ("iexact", "icontains", "istartswith", "iendswith"):177 lookup = "UPPER(%s)" % lookup178 179 175 return lookup 180 176 181 177 def no_limit_value(self):  - 
      
tests/schema/tests.py
diff --git a/tests/schema/tests.py b/tests/schema/tests.py index 3a2947cf43..182e3486e0 100644
a b class SchemaTests(TransactionTestCase): 913 913 editor.create_model(GeneratedFieldContainsModel) 914 914 915 915 field = GeneratedField( 916 expression=Q(text__ contains="foo"),916 expression=Q(text__icontains="FOO"), 917 917 db_persist=True, 918 918 output_field=BooleanField(), 919 919 )  
comment:10 by , 19 months ago
It is effectively a solution but I'm not convinced this will do more good than harm.
#3575 has been merged 16 years ago this means in between now and then thousands of projects were created and added a functional index on UPPER("col") to have i(exact|contains|startwith) make use of it and the moment they upgrade to a minor version of 5.0 their database will start running slow queries as their indices will be unsuables.
On the other hand we have a bug in a newly introduced feature for a very particular use case that might be affecting only a few users (must use generated field, must be on a latest version of Postgres, must use i(exact|contains|startwith).
I appreciate the intent to solve this issue but I think we need to dig deeper to truly understand why this is happening before jumping to conclusion as there are no true urgency to get things right . The release blocker assignment is self-imposed and nothing prevents us from deferring a solution to this problem to a future 5.0 release if we can't understand why this is happening before the May release as for all we know if might be a bug in Postgres itself.
I tried reaching out on libera.chat#postgres IRC to get an answer but no one could answer me there (first time this happens) so I was planing to reach out to their mailing list this week but I might run out of time so if someone feels comfortable doing so please do.
To summarize I think we should understand why this is happening before taking any potential harmful action for the sake of marking this ticket resolved. For all we know many other functions and lookups could be affected and this is just the tip of the iceberg.
comment:11 by , 19 months ago
| Cc: | added | 
|---|
David, I think you have successfully engaged with the PostgreSQL team/devs in the past, leading to productive conversations. Would you have some availability to reach out to them again to seek their assistance in debugging this specific issue we're encountering with PostgreSQL >= 12.18, 13.14, 14.11, 15.6, 16.2?
comment:12 by , 19 months ago
| Has patch: | unset | 
|---|
comment:13 by , 18 months ago
| Resolution: | → needsinfo | 
|---|---|
| Severity: | Release blocker → Normal | 
| Status: | new → closed | 
| Triage Stage: | Accepted → Unreviewed | 
It is currently not clear whether the bug is on Postgres side or not. Until this is clear, we can keep the ticket in the "needsinfo" state as it best represents it's current status.
Reopening the ticket did bring it to the attention of others and has helped us look into this a little more, but even in a closed state, the ticket is still "visible"/searchable. 
We can reopen once we better understand the root of the problem and it's clear that Django needs to implement changes.
comment:14 by , 14 months ago
We are also running into this issue after upgrading from Postgres 14.10 to 14.11, iexact and icontains in generated fields were replaced with similar iregex queries as a workaround.
This was tested with django 5.0.2 and latest psycopg2 2.9.9. Changing to
_icontainssolved the issue, but we have quite a few with_iexact....