Opened 7 months ago

Closed 7 months ago

Last modified 3 months ago

#35336 closed Bug (fixed)

Adding GeneratedField fails with ProgrammingError when using When on CharField

Reported by: Adrian Garcia Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 5.0
Severity: Release blocker Keywords: postgres, generatedfield, contains
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Forgive me if I am doing something incorrectly, but I cannot for the life of me add a GeneratedField to one of my models. To replicate this error, I created this test model:

from django.db import models

class TestModel(models.Model):
    description = models.TextField()

Running makemigrations/migrate, then modifying the model to add this contains_heck field:

from django.db import models

class TestModel(models.Model):
    description = models.TextField()
    contains_heck = models.GeneratedField(
        expression=models.Case(
            models.When(description__icontains="HECK", then=models.Value(True)),
            default=models.Value(False),
            output_field=models.BooleanField(),
        ),
        output_field=models.BooleanField(),
        db_persist=True,
    )

Which generates this migration:

# Generated by Django 5.0.3 on 2024-03-27 20:34

from django.db import migrations, models


class Migration(migrations.Migration):
    dependencies = [
        ("core", "00001_initial"),
    ]

    operations = [
        migrations.AddField(
            model_name="testmodel",
            name="contains_heck",
            field=models.GeneratedField(
                db_persist=True,
                expression=models.Case(
                    models.When(description__icontains="HECK", then=models.Value(True)), default=models.Value(False), output_field=models.BooleanField()
                ),
                output_field=models.BooleanField(),
            ),
        ),
    ]

And after running python manage.py migrate I get the following error:

Operations to perform:
  Apply all migrations: admin, auth, consumer, contenttypes, core, db, sessions
Running migrations:
  Applying core.0002_testmodel_contains_heck...Traceback (most recent call last):
  File "/opt/project/app/manage.py", line 24, in <module>
    main()
  File "/opt/project/app/manage.py", line 20, in main
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 107, in wrapper
    res = handle_func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/core/management/commands/migrate.py", line 356, in handle
    post_migrate_state = executor.migrate(
                         ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/migrations/executor.py", line 135, in migrate
    state = self._migrate_all_forwards(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards
    state = self.apply_migration(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/migrations/executor.py", line 252, in apply_migration
    state = migration.apply(state, schema_editor)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/migrations/migration.py", line 132, in apply
    operation.database_forwards(
  File "/usr/local/lib/python3.11/site-packages/django/db/migrations/operations/fields.py", line 108, in database_forwards
    schema_editor.add_field(
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 750, in add_field
    self.execute(sql, params)
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/schema.py", line 46, in execute
    sql = self.connection.ops.compose_sql(str(sql), params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/operations.py", line 195, in compose_sql
    return mogrify(sql, params, self.connection)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/django/db/backends/postgresql/psycopg_any.py", line 22, in mogrify
    return ClientCursor(cursor.connection).mogrify(sql, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/client_cursor.py", line 40, in mogrify
    pgq = self._convert_query(query, params)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/client_cursor.py", line 79, in _convert_query
    pgq.convert(query, params)
  File "/usr/local/lib/python3.11/site-packages/psycopg/_queries.py", line 208, in convert
    (self.template, self._order, self._parts) = f(bquery, self._encoding)
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/_queries.py", line 242, in _query2pg_client_nocache
    parts = _split_query(query, encoding, collapse_double_percent=False)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/psycopg/_queries.py", line 388, in _split_query
    raise e.ProgrammingError(
psycopg.ProgrammingError: only '%s', '%b', '%t' are allowed as placeholders, got '%H'

I took a look with the debugger and I think either Django is incorrectly passing the values to psycopg, or psycopg isn't splitting this SQL statement correctly. I say this because when _split_query tries to split the query, it ends up matching and consuming the first character of each Value:

[
    ('ALTER TABLE "core_testmodel" ADD COLUMN "contains_heck" boolean GENERATED ALWAYS AS (CASE WHEN UPPER("description"::text) LIKE UPPER(\'', "<re.Match object; span=(140, 142), match=b'%H'>"),
    ("ECK", '<re.Match object; span=(145, 147), match=b"%\'">'),
    (") THEN true ELSE false END) STORED", "None"),
]

Switching to models.When(description__icontains=models.Value("HECK"), ...) yields a similar error except it's complaining about %' instead of %H`.

If the contains_heck field is created at the same time as the rest of the model, everything works in either the description__icontains="HECK" or description__icontains=models.Value("HECK") configuration.

Versions:

  • Postgres 14.9
  • Python 3.11.2
  • Django 5.0.3
  • psycopg[binary] 3.1.18

Change History (17)

comment:1 by Simon Charette, 7 months ago

Component: MigrationsDatabase layer (models, ORM)
Keywords: contains added; _split_query removed
Owner: changed from nobody to Simon Charette
Severity: NormalRelease blocker
Status: newassigned
Triage Stage: UnreviewedAccepted

This can be reproduced on psycopg2 and psycopg>=3 and is reminiscent of the series of issues we had to fix with Constraint's usage of __contains lookup which require proper % escaping (#30408, #34553, #32369)

  • tests/schema/tests.py

    diff --git a/tests/schema/tests.py b/tests/schema/tests.py
    index f8e314d270..3b2d7bf03c 100644
    a b class Meta:  
    886886        with connection.schema_editor() as editor:
    887887            editor.create_model(GeneratedFieldOutputFieldModel)
    888888
     889    @isolate_apps("schema")
     890    @skipUnlessDBFeature("supports_stored_generated_columns")
     891    def test_add_generated_field_contains(self):
     892        class GeneratedFieldContainsModel(Model):
     893            text = TextField()
     894
     895            class Meta:
     896                app_label = "schema"
     897
     898        field = GeneratedField(
     899            expression=Q(text__icontains="FOO"),
     900            db_persist=True,
     901            output_field=BooleanField(),
     902        )
     903        field.contribute_to_class(GeneratedFieldContainsModel, "some_field")
     904
     905        with connection.schema_editor() as editor:
     906            editor.create_model(GeneratedFieldContainsModel)
     907            editor.add_field(GeneratedFieldContainsModel, field)
     908
     909
    889910    @isolate_apps("schema")
    890911    def test_add_auto_field(self):
    891912        class AddAutoFieldModel(Model):

comment:2 by Adrian Garcia, 7 months ago

Thanks for looking into this, I'm glad to know that it's a bug and not me doing something incorrectly. I'll use iregex as a workaround until 5.0.4.

Out of curiosity, why does this issue only present itself when adding a field? Does Django handle SQL generation differently for table creation vs field addition?

comment:3 by Simon Charette, 7 months ago

Has patch: set
Patch needs improvement: set

Out of curiosity, why does this issue only present itself when adding a field? Does Django handle SQL generation differently for table creation vs field addition?

I'm still laying down the details in this MR but the short answer is yes.

The DBAPI Cursor.execute method can be used in two ways when dealing with parameters. Either (sql: str, params: []) which will delegate parametrization to the library or (sql: str, params: None) which won't attempt to parametrize the SQL. This is an important difference and requires special care when dealing with % literals in parameters as it's the symbol use for parametrization placeholders by most libraries.

Well some backend simply don't support parametrization in some DDL statements such as GENERATED columns declaration, in other words you can't do

cursor.execute(
    "ALTER TABLE invoice ADD COLUMN discount_display text GENERATED ALWAYS discount || %s;",
    ["%"]
)

So you must do

cursor.execute(
    "ALTER TABLE invoice ADD COLUMN discount_display text GENERATED ALWAYS AS discount || '%';", None
)

Well in the case of SchemaEditor.create_model we liberally do one or the other and in your case, because you don't have any other field with a db_default assuming your use Postgres, it does the right thing.

In the case of SchemaEditor.add_field however we always do params: list, which forces parametrization, and thus

cursor.execute(
    "ALTER TABLE testmodel ADD COLUMN contains_heck text GENERATED ALWAYS AS UPPER(description) LIKE '%HECK%';", []
)

Crashes as there is an attempt to interpolate the two % while params is empty.

The schema migration has historically use parametrizable DDL in the past when possible but this has become increasingly complex as more and more features were added (indexes, constraints, db_default, GeneratedField) to a point where I don't think we can continue doing so but the tricky part is that we can't make such a change in a bug fix release without risking to introduce further problems. This is made even worst by the fact that objects as Index and Constraint subclasses return already interpolated SQL (their as_sql methods return "CHECK UPPER(description) LIKE '%HECK%'" instead of ("CHECK UPPER(description) LIKE %s", "%HECK%") so even if we wanted to use it entirely to the backend until the last minute by using quote_value ourselves we can't as by that point we have a mixture of interpolated and non-interpolated SQL.

comment:4 by Simon Charette, 7 months ago

Patch needs improvement: unset

comment:5 by Adrian Garcia, 7 months ago

Thanks for the rundown, it really helped me understand what was going on when I followed the execution with a debugger. The ORM is easily my favorite part of Django and it's no surprise that it is complicated as hell, are there plans for a more unified SQL translator in a future release? I'd be interested in following the progress of that.

I tested the fix and it works great, might not even need to use the workaround if 5.0.4 gets released in the next few weeks. Thanks for the speedy patch.

comment:6 by Simon Charette, 7 months ago

are there plans for a more unified SQL translator in a future release

I think the many regressions we've run into in the past releases in escaping DDL (the subset of SQL for defining objects) in the schema editor (the component behind migrations) warrants spending time having a less convoluted approach.

I think that we should experiment with having backends denote whether or not they support parametrized DDL (the ability to call Cursor.execute(ddl: str, params: tuple) over having the schema editor do the % formatting itself), make sure all parts of the scheme editor return (str, tuple) and never simply str, and then adjust SchemaEditor.execute to do cursor.execute(ddl, params) when connection.features.support_parametrized_dll and do cursor.execute(ddl % map(self.quote_value, params), None) otherwise.

comment:7 by GitHub <noreply@…>, 7 months ago

Resolution: fixed
Status: assignedclosed

In 888b9042:

Fixed #35336 -- Addressed crash when adding a GeneratedField with % literals.

A longer term solution is likely to have a better separation of parametrized
DDL altogether to handle checks, constraints, defaults, and generated fields
but such a change would require a significant refactor that isn't suitable
for a backport.

Thanks Adrian Garcia for the report.

comment:8 by Natalia <124304+nessita@…>, 7 months ago

In fead2dd:

[5.0.x] Fixed #35336 -- Addressed crash when adding a GeneratedField with % literals.

A longer term solution is likely to have a better separation of parametrized
DDL altogether to handle checks, constraints, defaults, and generated fields
but such a change would require a significant refactor that isn't suitable
for a backport.

Thanks Adrian Garcia for the report.

Backport of 888b9042b3598bab6557c62de82505eec9ea62ed from main

comment:9 by Natalia Bidart, 7 months ago

Triage Stage: AcceptedReady for checkin

comment:10 by Natalia Bidart, 7 months ago

Adrian, the release is planned for tomorrow. Thank you for your help!

comment:11 by Adrian Garcia, 7 months ago

Woohoo, thanks again everyone for all the effort. You managed to have 5.0.4 released before my PR was merged lol.

comment:12 by GitHub <noreply@…>, 7 months ago

In d859c0b:

[5.0.x] Refs #35336 -- Fixed SchemaTests.test_add_generated_field_contains() test on PostgreSQL.

Concat() in Django 5.0 is not immutable on PostgreSQL and cannot be used
in GeneratedField, see 6364b6ee1071381eb3a23ba6b821fc0d6f0fce75.

comment:13 by Sarah Boyce <42296566+sarahboyce@…>, 7 months ago

In 73b62a21:

Refs #35194 -- Adjusted a generated field test to work on Postgres 15.6+.

Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 changed the way the immutability
of generated and default expressions is detected in postgres/postgres@743ddaf.

The adjusted test semantic is presereved by switching from icontains to
contains as both make use of a % literal which requires proper escaping.

Refs #35336.

Thanks bcail for the report.

comment:14 by Sarah Boyce <42296566+sarahboyce@…>, 7 months ago

In 5d95a1c:

[5.0.x] Refs #35194 -- Adjusted a generated field test to work on Postgres 15.6+.

Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 changed the way the immutability
of generated and default expressions is detected in postgres/postgres@743ddaf.

The adjusted test semantic is presereved by switching from icontains to
contains as both make use of a % literal which requires proper escaping.

Refs #35336.

Thanks bcail for the report.

Backport of 73b62a21265c4a417004d64d13a896469e2558f3 from main.

comment:15 by Sarah Boyce <42296566+sarahboyce@…>, 3 months ago

In f359990e:

Fixed #35625 -- Fixed a crash when adding a field with db_default and check constraint.

This is the exact same issue as refs #30408 but for creating a model with a
constraint containing % escapes instead of column addition. All of these issues
stem from a lack of SQL and parameters separation from the BaseConstraint DDL
generating methods preventing them from being mixed with other parts of the
schema alteration logic that do make use of parametrization on some backends
(e.g. Postgres, MySQL for DEFAULT).

Prior to the addition of Field.db_default and GeneratedField in 5.0
parametrization of DDL was never exercised on model creation so this is
effectively a bug with db_default as the GeneratedField case was addressed by
refs #35336.

Thanks Julien Chaumont for the report and Mariusz Felisiak for the review.

comment:16 by Sarah Boyce <42296566+sarahboyce@…>, 3 months ago

In d8116bf7:

[5.1.x] Fixed #35625 -- Fixed a crash when adding a field with db_default and check constraint.

This is the exact same issue as refs #30408 but for creating a model with a
constraint containing % escapes instead of column addition. All of these issues
stem from a lack of SQL and parameters separation from the BaseConstraint DDL
generating methods preventing them from being mixed with other parts of the
schema alteration logic that do make use of parametrization on some backends
(e.g. Postgres, MySQL for DEFAULT).

Prior to the addition of Field.db_default and GeneratedField in 5.0
parametrization of DDL was never exercised on model creation so this is
effectively a bug with db_default as the GeneratedField case was addressed by
refs #35336.

Thanks Julien Chaumont for the report and Mariusz Felisiak for the review.

Backport of f359990e4909db8722820849d61a6f5724338723 from main.

comment:17 by Sarah Boyce <42296566+sarahboyce@…>, 3 months ago

In 41d8ef18:

[5.0.x] Fixed #35625 -- Fixed a crash when adding a field with db_default and check constraint.

This is the exact same issue as refs #30408 but for creating a model with a
constraint containing % escapes instead of column addition. All of these issues
stem from a lack of SQL and parameters separation from the BaseConstraint DDL
generating methods preventing them from being mixed with other parts of the
schema alteration logic that do make use of parametrization on some backends
(e.g. Postgres, MySQL for DEFAULT).

Prior to the addition of Field.db_default and GeneratedField in 5.0
parametrization of DDL was never exercised on model creation so this is
effectively a bug with db_default as the GeneratedField case was addressed by
refs #35336.

Thanks Julien Chaumont for the report and Mariusz Felisiak for the review.

Backport of f359990e4909db8722820849d61a6f5724338723 from main.

Note: See TracTickets for help on using tickets.
Back to Top