Opened 2 hours ago
Last modified 33 minutes ago
#36748 new Bug
Postgres UNNEST optimisation of bulk_create cannot handle fields with `get_placeholder`.
| Reported by: | Chris Wesseling | Owned by: | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 5.2 |
| Severity: | Release blocker | Keywords: | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In order to allow fields to define a different placeholder than "%s" in https://github.com/django/django/blob/97acd4d2f92eef8c285bac070d437bf0fd52e071/django/db/models/sql/compiler.py#L1769 BaseSQLInsertCompiler.assemble_as_sql does:
get_placeholders = [getattr(field, "get_placeholder", None) for field in fields]
and calls those functions with (value, compiler, connection) to generate the placeholders.
These alternative placeholders are not always UNNESTable. e.g. "%s::{}" for arrays and and ranges, which we tried to work-around by filtering
# Fields that don't use standard internal types might not be
# unnest'able (e.g. array and geometry types are known to be
# problematic).
or any(
(field.target_field if field.is_relation else field).get_internal_type()
not in self.connection.data_types
for field in fields
)
but more general since get_placeholder can be a function of value, we can't UNNEST forall fields, value_rows, as it breaks the "one placeholder fits all values" assumption.
Therefore if any field in fields has get_placeholder, we should fallback to the default, unoptimised implementation.
In fact the addition of
# Field.get_placeholder takes value as an argument, therefore the
# resulting placeholder might be dependent on the value.
# in UNNEST requires a single placeholder to "fit all values" in
# the array.
or any(hasattr(field, "get_placeholder") for field in fields)
before the mentioned internal_type filter that's already present, handles all cases in the test suite and never hits the internal_type case, because all test examples also have a get_placeholder. I don't know if this will always be the case, so I suggest keeping the present filter too.
I bumped into this regression when upgrading to 5.2 and the call to a Postgres extension function in such a placeholder, but which has a db_type that is present in self.connection.data_types, didn't happen anymore for bulk inserts.
NB This bug can go unnoticed: bulk_create might succeed, but with different data written to disk than what is expected. And subsequent reads and seeks will have the wrong results.
Change History (5)
comment:1 by , 2 hours ago
| Has patch: | set |
|---|
comment:2 by , 40 minutes ago
| Description: | modified (diff) |
|---|
follow-up: 5 comment:3 by , 39 minutes ago
| Description: | modified (diff) |
|---|---|
| Severity: | Normal → Release blocker |
| Triage Stage: | Unreviewed → Accepted |
comment:4 by , 38 minutes ago
| Description: | modified (diff) |
|---|
comment:5 by , 33 minutes ago
Replying to Simon Charette:
Thanks for the excellent report and patch Chris!
The only part missing is a release note for 5.2.9 as this is a bug in a newly introduced feature and thus still qualifies for backport.
Thanks for the quick response.
Should I just write one for 5.2.9.txt or 6.0.txt too?
Thanks for the excellent report and patch Chris!
The only part missing is a release note for 5.2.9 as this is a bug in a newly introduced feature and thus still qualifies for backport.
I also think it's worth keeping both as well as they are cheap to perform and could cover other cases in the wild.
Small note to me that this relates to #36727 but the latter won't fix issue as, just like you said,
get_placeholdercan be a function ofvalue. This only means we'll need even more logic to handle the presence of eitherget_placeholderorget_placeholder_sql.