Opened 2 hours ago

Last modified 33 minutes ago

#36748 new Bug

Postgres UNNEST optimisation of bulk_create cannot handle fields with `get_placeholder`.

Reported by: Chris Wesseling Owned by:
Component: Database layer (models, ORM) Version: 5.2
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Simon Charette)

In order to allow fields to define a different placeholder than "%s" in https://github.com/django/django/blob/97acd4d2f92eef8c285bac070d437bf0fd52e071/django/db/models/sql/compiler.py#L1769 BaseSQLInsertCompiler.assemble_as_sql does:

        get_placeholders = [getattr(field, "get_placeholder", None) for field in fields]

and calls those functions with (value, compiler, connection) to generate the placeholders.

These alternative placeholders are not always UNNESTable. e.g. "%s::{}" for arrays and and ranges, which we tried to work-around by filtering

            # Fields that don't use standard internal types might not be
            # unnest'able (e.g. array and geometry types are known to be
            # problematic).
            or any(
                (field.target_field if field.is_relation else field).get_internal_type()
                not in self.connection.data_types
                for field in fields
            )

but more general since get_placeholder can be a function of value, we can't UNNEST forall fields, value_rows, as it breaks the "one placeholder fits all values" assumption.

Therefore if any field in fields has get_placeholder, we should fallback to the default, unoptimised implementation.

In fact the addition of

            # Field.get_placeholder takes value as an argument, therefore the
            # resulting placeholder might be dependent on the value.
            # in UNNEST requires a single placeholder to "fit all values" in 
            # the array.
            or any(hasattr(field, "get_placeholder") for field in fields)

before the mentioned internal_type filter that's already present, handles all cases in the test suite and never hits the internal_type case, because all test examples also have a get_placeholder. I don't know if this will always be the case, so I suggest keeping the present filter too.

I bumped into this regression when upgrading to 5.2 and the call to a Postgres extension function in such a placeholder, but which has a db_type that is present in self.connection.data_types, didn't happen anymore for bulk inserts.

NB This bug can go unnoticed: bulk_create might succeed, but with different data written to disk than what is expected. And subsequent reads and seeks will have the wrong results.

Change History (5)

comment:1 by Chris Wesseling, 2 hours ago

Has patch: set

comment:2 by Chris Wesseling, 40 minutes ago

Description: modified (diff)

comment:3 by Simon Charette, 39 minutes ago

Description: modified (diff)
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Thanks for the excellent report and patch Chris!

The only part missing is a release note for 5.2.9 as this is a bug in a newly introduced feature and thus still qualifies for backport.

I don't know if this will always be the case, so I suggest keeping the present filter too.

I also think it's worth keeping both as well as they are cheap to perform and could cover other cases in the wild.

Small note to me that this relates to #36727 but the latter won't fix issue as, just like you said, get_placeholder can be a function of value. This only means we'll need even more logic to handle the presence of either get_placeholder or get_placeholder_sql.

comment:4 by Simon Charette, 38 minutes ago

Description: modified (diff)

in reply to:  3 comment:5 by Chris Wesseling, 33 minutes ago

Replying to Simon Charette:

Thanks for the excellent report and patch Chris!

The only part missing is a release note for 5.2.9 as this is a bug in a newly introduced feature and thus still qualifies for backport.

Thanks for the quick response.
Should I just write one for 5.2.9.txt or 6.0.txt too?

Note: See TracTickets for help on using tickets.
Back to Top