#34955 closed Cleanup/optimization (fixed)
Make Concat() use the database operator `||` on PostgreSQL.
Reported by: | Paolo Melchiorre | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | field, database, generated, output_field |
Cc: | Simon Charette, David Wobrock | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The django.db.models.functions.Concat
database ORM function in the PostgreSQL database backend use the CONCAT
SQL function.
Unfortunately, the CONCAT
function in PostgreSQL is not IMMUTABLE
and cannot be used in functional indexes or generated fields.
For example, if you use Concat("first_name", Value(" "), "last_name")
in a GeneratedField
you get the following error:
psycopg.errors.InvalidObjectDefinition: generation expression is not immutable
I propose to make the string concatenation operator ||
available in the PostgreSQL database backend in addition to the already available string concatenation function CONCAT
.
Change History (18)
follow-up: 2 comment:1 by , 13 months ago
comment:2 by , 13 months ago
Type: | New feature → Bug |
---|
Replying to Mariusz Felisiak:
I'm not convinced, there are other built-in functions that could be implemented differently and currently cannot be used in
GeneratedField
s
The string concatenation operator ||
is not an exotic function specific to PostgreSQL, but an operator already used in other database backends, so your concern about the possible increase in requests to add new database functions to databases does not apply.
I would add that I would have liked to open this issue since the introduction of functional indexes in Django 3.2, but I never did so due to lack of time.
I decided to open this issue now after seeing that the choice to use the non-immutable concatenation function CONCAT
instead of the immutable concatenation operator ||
in the PostgreSQL backend also affects the case of GeneratedFields
.
I'm pessimistic about adding multiple new functions for such cases. This may open a can of worms. The current thread is to keep Django a core framework, not providing every utility which might be useful.
Among other things, the string concatenation example is the most used to explain Generated Fields (e.g fly.io, djangochat.com, paulox.net, ...), but, due to this bug in the PostgreSQL backend database, it only works on a subset of Django backend databases, unlike the current thread of keeping Django a core framework, as you reiterated.
follow-up: 6 comment:3 by , 13 months ago
Type: | Bug → New feature |
---|
... databases does not apply.
I don't understand why TBH. Do you want to say that it doesn't apply because it's PostgreSQL and we should treat it specially? I would argue with that.
This is definitively not a bug. Concat()
is implemented this way from the very beginning and database limitations for GeneratedField
are documented.
follow-up: 5 comment:4 by , 13 months ago
Moreover, there is no easy way to implement Concat()
on PostgreSQL such that it uses ||
, is immutable, and work the same way as now, because it uses COALESCE()
which is also considered mutable by PostgreSQL. I understand that you want to use new goodies on PostgreSQL but it's a database limitation, not a Django fault.
comment:5 by , 13 months ago
Replying to Mariusz Felisiak:
... I understand that you want to use new goodies on PostgreSQL ...
This isn't about me wanting to play with the latest Django features with PostgreSQL, I solved the problem in my code some time ago by writing a function that uses ||
instead of CONCAT
.
The problem, however, remains non-expert users to whom we reiterate that the generated fields are compatible with all supported databases, and then the first example we do, uses strings concatenations, that we already know will not work with at least one of these databases out-of-the-box.
Moreover, there is no easy way to implement
Concat()
on PostgreSQL such that it uses||
, is immutable, and work the same way as now, because it usesCOALESCE()
which is also considered mutable by PostgreSQL ... it's a database limitation, not a Django fault.
On the contrary, in PostgreSQL, you can concatenate strings into a generated column without any problem.
See for example:
CREATE TABLE "samples_fullnames" ( "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "first_name" varchar(150) NOT NULL, "last_name" varchar(150) NOT NULL, "full_name" text GENERATED ALWAYS AS ( "first_name"::text || ' '::text || "last_name"::text ) STORED );
But also: (mimicking the SQL code generated by the ORM for SQLite)
CREATE TABLE "samples_names" ( "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "first_name" varchar(150) NOT NULL, "last_name" varchar(150) NOT NULL, "full_name" text GENERATED ALWAYS AS ( COALESCE(("first_name")::text, '') || COALESCE(COALESCE((' ')::text, '') || COALESCE(("last_name")::text, ''), '') ) STORED );
If there is a limitation I would say it lies in how the Django ORM generates the SQL code for PostgreSQL.
CREATE TABLE "samples_user" ( "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "first_name" varchar(150) NOT NULL, "last_name" varchar(150) NOT NULL, "full_name" text GENERATED ALWAYS AS ( CONCAT(("first_name")::text, (CONCAT((' ')::text, ("last_name")::text))::text) ) STORED );
comment:6 by , 13 months ago
Replying to Mariusz Felisiak:
... databases does not apply.
I don't understand why TBH.
You quoted my sentence so much that I had to go and reread it :-D
Do you want to say that it doesn't apply because it's PostgreSQL and we should treat it specially? I would argue with that.
My intention was not to ask for special treatment for PostgreSQL, but on the contrary to ask that it not be the only one treated differently in this particular case.
I tried to explain that the ||
operator is not a function, but an operator, and that among other things it has already been used in other supported backend databases, so requesting that it ALSO be used in PostgreSQL doesn't seem right to me. a special request.
This is definitively not a bug.
Concat()
is implemented this way from the very beginning and database limitations forGeneratedField
are documented.
I don't know the reasons for using CONCAT
instead of ||
, but I reiterate that this choice was already a problem for functional indexes, and now generated fields have been added. It therefore seems to me that that choice can at least be reevaluated.
Note: I realize that explaining myself in writing in English could be a limitation of mine, and in case what I write is misunderstood or arrogant, I reiterate here that this is not my intention. :-)
comment:7 by , 13 months ago
Throwing some fuel on the fire with a few fyi points
- Firstly just want to make it clear that pg marking concat() not immutable is a feature (by design) not a bug :D (it's "stable" but not "immutable" because settings can still affect the output)
- Apparently
||
is ANSI (TIL) - Even though it is ANSI you still have to set mode
PIPES_AS_CONCAT
[1] on MySQL Concat()
doesn't actually useCoalesce()
for pg because it doesn't need it, NULLs are ignored withCONCAT()
on pg [2]- Unfortunately we can't change
Concat()
to use||
because on pg it _does_ evaluate NULLs. If it were to use||
then we _would_ need to make use ofCoalesce()
to make it consistent.
I'm kinda hoping though that we do get a (separate) immutable concat for pg because concatenating strings seems to be a common desire, at least for me :D
Lastly can I just say that (somewhat jokingly but semi-serious)... we may be missing the perfect opportunity to make use of the __floordiv__()
operator method [3] ;D
GeneratedField(expression=F('first_name') // F('last_name'))
[1] https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sqlmode_pipes_as_concat
[2] https://www.postgresql.org/docs/current/functions-string.html
[3] https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types
Edit: I mean NOT immutable.
follow-up: 9 comment:8 by , 13 months ago
If casting to text data-type fixes the issue then I'd consider documenting this workaround instead of adding a PostgreSQL-specific function (where text casting would still be required). For example, adding the following sentence:
"In order to make an IMMUTABLE
expression MUTABLE
on PostgreSQL, you can wrap the expression with Cast()
, e.g. Cast(Concat(F("first_name"), Value(" "), F("last_name")), TextField())
."
in the warning about IMMUTABLE
functions on PostgreSQL.
comment:9 by , 13 months ago
Replying to Mariusz Felisiak:
If casting to text data-type fixes the issue then I'd consider documenting this workaround instead of adding a PostgreSQL-specific function (where text casting would still be required). For example, adding the following sentence:
"In order to make an
IMMUTABLE
expressionMUTABLE
on PostgreSQL, you can wrap the expression withCast()
, e.g.Cast(Concat(F("first_name"), Value(" "), F("last_name")), TextField())
."
in the warning about
IMMUTABLE
functions on PostgreSQL.
Unfortunately the SQL code generate by the Django ORM is still broken in PostgreSQL.
I wrote the class as susggested by Mariusz:
class Person(models.Model): first_name = models.CharField() last_name = models.CharField() full_name = models.GeneratedField( expression=Cast( Concat("first_name", V(" "), "last_name"), models.TextField(), ), db_persist=True, output_field=models.TextField(), )
But the migrations generate this SQL code:
CREATE TABLE "samples_person" ( "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "first_name" varchar NOT NULL, "last_name" varchar NOT NULL, "full_name" text GENERATED ALWAYS AS ( ( CONCAT( ("first_name")::text, (CONCAT((' ')::text, ("last_name")::text))::text ) )::text ) STORED );
that generated this error message psycopg.errors.InvalidObjectDefinition: generation expression is not immutable
comment:10 by , 13 months ago
I need to clarify an incorrect assumption I was making in my earlier comment.
COALESCE() is acceptable for use in generated fields & function indexes. This is because it's not a "real" function, it's an expression.
This means that Paolo's proposal to make Concat()
on Postgres more like it does on SQLite an acceptable change keeping behaviour consistent.
follow-up: 12 comment:11 by , 13 months ago
Cc: | added |
---|
Unfortunately we can't change Concat() to use
||
because on pg it _does_ evaluate NULLs. If it were to use||
then we _would_ need to make use of Coalesce() to make it consistent.
That's the crux of the issue and why we had to go back and forth on how it was implemented on Postgres when adding support for functional index and constraints. There are few tickets related to this such one such as #29582 and #30385.
the migrations generate this SQL code (code snipped with many unnecessary
::text
)
This is a side effect of 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca (#33308) to accommodate support for psycopg>3
server-side bindings. See 779cd28acb1f7eb06f629c0ea4ded99b5ebb670a (#34840) which removed many of them but some still remain.
Based on #30385 I think that the best way forward here is to stop using CONCAT
entirely and use a strategy where it relies on ||
with Coalesce
and Cast
appropriately when dealing with expressions that are nullable and/or non-text. The challenge here is that we can't trust .null
as the our output field resolving strategy doesn't carry null
affinity. If we want to make sure Concat
maintains is previous behaviour on Postgres we must wrap every source expression in Coalesce
.
Normally when we change the SQL generated by an expression it leaves all index generated with the previous implementation unusable and forces users to re-create them but in this case it wasn't possible to even create such index as Concat
is not IMMUTABLE
so I don't think that's an issue.
comment:12 by , 13 months ago
Replying to Simon Charette:
Unfortunately we can't change Concat() to use
||
because on pg it _does_ evaluate NULLs. If it were to use||
then we _would_ need to make use of Coalesce() to make it consistent.
That's the crux of the issue and why we had to go back and forth on how it was implemented on Postgres when adding support for functional index and constraints. There are few tickets related to this such one such as #29582 and #30385.
Thanks, for pointing us to these old issues.
the migrations generate this SQL code (code snipped with many unnecessary
::text
)
This is a side effect of 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca (#33308) to accommodate support for
psycopg>3
server-side bindings. See 779cd28acb1f7eb06f629c0ea4ded99b5ebb670a (#34840) which removed many of them but some still remain.
Are you suggesting opening an issue to remove all the remaining unnecessary CAST
?
Based on #30385 I think that the best way forward here is to stop using
CONCAT
entirely and use a strategy where it relies on||
withCoalesce
andCast
appropriately when dealing with expressions that are nullable and/or non-text. The challenge here is that we can't trust.null
as the our output field resolving strategy doesn't carrynull
affinity. If we want to make sureConcat
maintains is previous behaviour on Postgres we must wrap every source expression inCoalesce
.
I understand that GeneratedField
is only the last one affected by the fact that CONCAT
is not IMMUTABLE
I agree with the plan to replace CONCAT
everywhere with ||
given how many problems it would solve.
Normally when we change the SQL generated by an expression it leaves all index generated with the previous implementation unusable and forces users to re-create them but in this case it wasn't possible to even create such index as
Concat
is notIMMUTABLE
so I don't think that's an issue.
Better this way I would say, there will be no indexes that the user needs to re-create.
comment:13 by , 13 months ago
Summary: | Make available the string concatenation operator `||` for PostgreSQL → Make Concat use the database operator `||` on PostgreSQL. |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | New feature → Cleanup/optimization |
comment:14 by , 13 months ago
Summary: | Make Concat use the database operator `||` on PostgreSQL. → Make Concat() use the database operator `||` on PostgreSQL. |
---|
comment:15 by , 13 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:16 by , 13 months ago
Has patch: | set |
---|
comment:17 by , 13 months ago
Cc: | added |
---|
I'm not convinced, there are other built-in functions that could be implemented differently and currently cannot be used in
GeneratedField
s, e.g.SHA512
,SHA384
,SHA224
,SHA1
, orMD5
on Oracle. I'm pessimistic about adding multiple new functions for such cases. This may open a can of worms. The current thread is to keep Django a core framework, not providing every utility which might be useful.