Opened 9 months ago

Closed 7 months ago

Last modified 2 months ago

#35194 closed Bug (needsinfo)

Postgres 16.2 with _iexact leads to IndeterminateCollation

Reported by: Aldalen Owned by: nobody
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Simon Charette, David Sanders Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Aldalen)

There is something up with how "_iexact" GeneratedField expressions are turned into SQL for postgres 16.2. This does not occur with 16.1. This leads to a "psycopg2.errors.IndeterminateCollation: could not determine which collation to use for upper() function" error. This was caught in our CI/CD env when I revved it to 16.2. Existing / old migrations failed. This is a reduced examples from our code, with exception paths redacted where necessary.

Model Definition:

class TestModel(models.Model):
     test = models.CharField(max_length=254, blank=True)
     test_gen = models.GeneratedField(expression=models.Q(test__iexact='yes'), output_field=models.BooleanField(), db_persist=True)

Migration content:

migrations.AddField(
            model_name='testmodel',
            name='test_gen',
            field=models.GeneratedField(db_persist=True, expression=models.Q(("test__iexact",'yes')), output_field=models.BooleanField()),
        )

Error:

  Applying module.migration_name...Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 103, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.IndeterminateCollation: could not determine which collation to use for upper() function
HINT:  Use the COLLATE clause to set the collation explicitly.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/builds/[redacted]/manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 107, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
                         ^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/operations/fields.py", line 108, in database_forwards
    schema_editor.add_field(
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 750, in add_field
    self.execute(sql, params)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/postgresql/schema.py", line 48, in execute
    return super().execute(sql, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 201, in execute
    cursor.execute(sql, params)
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 122, in execute
    return super().execute(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 79, in execute
    return self._execute_with_wrappers(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers
    return executor(sql, params, many, context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 100, in _execute
    with self.db.wrap_database_errors:
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 103, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
django.db.utils.ProgrammingError: could not determine which collation to use for upper() function
HINT:  Use the COLLATE clause to set the collation explicitly.

Change History (14)

comment:1 by Aldalen, 9 months ago

This was tested with django 5.0.2 and latest psycopg2 2.9.9. Changing to _icontains solved the issue, but we have quite a few with _iexact....

Last edited 9 months ago by Aldalen (previous) (diff)

comment:2 by Aldalen, 9 months ago

Description: modified (diff)

comment:3 by Mariusz Felisiak, 9 months ago

Component: contrib.postgresDatabase layer (models, ORM)
Owner: set to nobody
Resolution: needsinfo
Status: newclosed
Summary: Postgres 16.2 with _iexact leads to psycopg2.errors.IndeterminateCollationPostgres 16.2 with _iexact leads to IndeterminateCollation

Thanks for the report, this looks like a bug in PostgreSQL. I couldn't find anything related with this change in PostgreSQL 16.2 release notes. I've checked a few ways to define that kind of expression in the GeneratedField and it seems to be related to calling UPPER() on strings:

  • models.GeneratedField(expression=models.Q(test__iexact='yes'), output_field=models.BooleanField(), db_persist=True) crashes 💥
    • SQL:
      "test_gen" boolean GENERATED ALWAYS AS (UPPER("test"::text) = UPPER('yes')) STORED
      
  • models.GeneratedField(expression=lookups.Exact(Upper("test"), "YES"), output_field=models.BooleanField(), db_persist=True) works ✅
    • SQL:
      "test_gen" boolean GENERATED ALWAYS AS (UPPER("test") = ('YES')) STORED
      

Removing ::text doesn't change anything, it also crashes 💥:

"test_gen" boolean GENERATED ALWAYS AS (UPPER("test") = UPPER('yes')) STORED

Please reopen the ticket if you provide details that this is an intentional behavior change in PostgreSQL.

comment:4 by Simon Charette, 7 months ago

Cc: Simon Charette added
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Re-opening as a release blocker for 5.0 at it's a bug in a new feature (GeneratedField) after some sleuthing on #35368.

The culprit is this change Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 released on 2024-02-08.

Fix function volatility checking for GENERATED and DEFAULT expressions (Tom Lane)

These places could fail to detect insertion of a volatile function default-argument expression, or decide that a polymorphic function is volatile although it is actually immutable on the datatype of interest. This could lead to improperly rejecting or accepting a GENERATED clause, or to mistakenly applying the constant-default-value optimization in ALTER TABLE ADD COLUMN.

In essence the problem seems similar to #34955 as UPPER can return different value depending on the collation and thus is not immutable per-se?

I've tried to come up with a workaround but I'm not sure what should be done. The following doesn't work either

UPPER("text" COLLATE "C") COLLATE "C" = UPPER('test' COLLATE "C") COLLATE "C"

so it's possible there might be a bug on the Postgres side as well? In all cases keeping this ticket open should bring visibility to the issue.

A work around in the mean time is likely to use explicit collations

Version 0, edited 7 months ago by Simon Charette (next)

comment:5 by Sarah Boyce, 7 months ago

Resolution: needsinfo
Status: closednew

comment:6 by Sarah Boyce <42296566+sarahboyce@…>, 7 months ago

In 73b62a21:

Refs #35194 -- Adjusted a generated field test to work on Postgres 15.6+.

Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 changed the way the immutability
of generated and default expressions is detected in postgres/postgres@743ddaf.

The adjusted test semantic is presereved by switching from icontains to
contains as both make use of a % literal which requires proper escaping.

Refs #35336.

Thanks bcail for the report.

comment:7 by Sarah Boyce <42296566+sarahboyce@…>, 7 months ago

In 5d95a1c:

[5.0.x] Refs #35194 -- Adjusted a generated field test to work on Postgres 15.6+.

Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 changed the way the immutability
of generated and default expressions is detected in postgres/postgres@743ddaf.

The adjusted test semantic is presereved by switching from icontains to
contains as both make use of a % literal which requires proper escaping.

Refs #35336.

Thanks bcail for the report.

Backport of 73b62a21265c4a417004d64d13a896469e2558f3 from main.

comment:8 by Sarah Boyce, 7 months ago

One option I have found (could be a bad idea) is to revert some of #3575. This was an optimisation where ILIKE was removed in preference of using UPPER(field) LIKE UPPER('blah').
If we use ILIKE I no longer get an error here. I guess the question is, the change of #3575 was implemented many years ago and the performance of Postgres in this case may have moved on.

Looks a bit like:

  • django/db/backends/postgresql/base.py

    diff --git a/django/db/backends/postgresql/base.py b/django/db/backends/postgresql/base.py
    index e97ab6aa89..4e3f7b3658 100644
    a b class DatabaseWrapper(BaseDatabaseWrapper):  
    154154        "exact": "= %s",
    155155        "iexact": "= UPPER(%s)",
    156156        "contains": "LIKE %s",
    157         "icontains": "LIKE UPPER(%s)",
     157        "icontains": "ILIKE %s",
    158158        "regex": "~ %s",
    159159        "iregex": "~* %s",
    160160        "gt": "> %s",
    class DatabaseWrapper(BaseDatabaseWrapper):  
    163163        "lte": "<= %s",
    164164        "startswith": "LIKE %s",
    165165        "endswith": "LIKE %s",
    166         "istartswith": "LIKE UPPER(%s)",
    167         "iendswith": "LIKE UPPER(%s)",
     166        "istartswith": "ILIKE %s",
     167        "iendswith": "ILIKE %s",
    168168    }
    169169
    170170    # The patterns below are used to generate SQL pattern lookup clauses when
  • django/db/backends/postgresql/operations.py

    diff --git a/django/db/backends/postgresql/operations.py b/django/db/backends/postgresql/operations.py
    index 4b179ca83f..af2463b1d6 100644
    a b class DatabaseOperations(BaseDatabaseOperations):  
    172172            else:
    173173                lookup = "%s::text"
    174174
    175         # Use UPPER(x) for case-insensitive lookups; it's faster.
    176         if lookup_type in ("iexact", "icontains", "istartswith", "iendswith"):
    177             lookup = "UPPER(%s)" % lookup
    178 
    179175        return lookup
    180176
    181177    def no_limit_value(self):
  • tests/schema/tests.py

    diff --git a/tests/schema/tests.py b/tests/schema/tests.py
    index 3a2947cf43..182e3486e0 100644
    a b class SchemaTests(TransactionTestCase):  
    913913            editor.create_model(GeneratedFieldContainsModel)
    914914
    915915        field = GeneratedField(
    916             expression=Q(text__contains="foo"),
     916            expression=Q(text__icontains="FOO"),
    917917            db_persist=True,
    918918            output_field=BooleanField(),
    919919        )
Last edited 7 months ago by Sarah Boyce (previous) (diff)

comment:9 by Sarah Boyce, 7 months ago

Has patch: set

It's an idea, we might get a better idea 👍

comment:10 by Simon Charette, 7 months ago

It is effectively a solution but I'm not convinced this will do more good than harm.

#3575 has been merged 16 years ago this means in between now and then thousands of projects were created and added a functional index on UPPER("col") to have i(exact|contains|startwith) make use of it and the moment they upgrade to a minor version of 5.0 their database will start running slow queries as their indices will be unsuables.

On the other hand we have a bug in a newly introduced feature for a very particular use case that might be affecting only a few users (must use generated field, must be on a latest version of Postgres, must use i(exact|contains|startwith).

I appreciate the intent to solve this issue but I think we need to dig deeper to truly understand why this is happening before jumping to conclusion as there are no true urgency to get things right . The release blocker assignment is self-imposed and nothing prevents us from deferring a solution to this problem to a future 5.0 release if we can't understand why this is happening before the May release as for all we know if might be a bug in Postgres itself.

I tried reaching out on libera.chat#postgres IRC to get an answer but no one could answer me there (first time this happens) so I was planing to reach out to their mailing list this week but I might run out of time so if someone feels comfortable doing so please do.

To summarize I think we should understand why this is happening before taking any potential harmful action for the sake of marking this ticket resolved. For all we know many other functions and lookups could be affected and this is just the tip of the iceberg.

Last edited 7 months ago by Simon Charette (previous) (diff)

comment:11 by Natalia Bidart, 7 months ago

Cc: David Sanders added

David, I think you have successfully engaged with the PostgreSQL team/devs in the past, leading to productive conversations. Would you have some availability to reach out to them again to seek their assistance in debugging this specific issue we're encountering with PostgreSQL >= 12.18, 13.14, 14.11, 15.6, 16.2?

comment:12 by Sarah Boyce, 7 months ago

Has patch: unset

comment:13 by Sarah Boyce, 7 months ago

Resolution: needsinfo
Severity: Release blockerNormal
Status: newclosed
Triage Stage: AcceptedUnreviewed

It is currently not clear whether the bug is on Postgres side or not. Until this is clear, we can keep the ticket in the "needsinfo" state as it best represents it's current status.
Reopening the ticket did bring it to the attention of others and has helped us look into this a little more, but even in a closed state, the ticket is still "visible"/searchable.
We can reopen once we better understand the root of the problem and it's clear that Django needs to implement changes.

comment:14 by Adrian Garcia, 2 months ago

We are also running into this issue after upgrading from Postgres 14.10 to 14.11, iexact and icontains in generated fields were replaced with similar iregex queries as a workaround.

Note: See TracTickets for help on using tickets.
Back to Top