Opened 2 months ago
Closed 2 weeks ago
#35778 closed Cleanup/optimization (fixed)
Use native JSONObject on Postgres 16+ with server side bindings
Reported by: | john-parton | Owned by: | john-parton |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Sarah Boyce, Mariusz Felisiak, Sage Abdullah | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
JSONObject on Postgres 16 with server side bindings recently resulted in a crash. The most recent fix is to fallback to the use of jsonb_build_object on postgres 16 when using server side bindings.
See https://code.djangoproject.com/ticket/35734
And https://github.com/django/django/pull/18549
It is possible to use the native JSONObject with server side bindings, but it requires a little bit of use of cast
.
See https://github.com/django/django/commit/0f53d48115ba0295cefea33512dc146caad39443
There are two minor issues:
- Should Postgres 16 *without* server-side bindings use "cast" even though it's not strictly necessary? It it desirable or preferable to keep the generated SQL the same when toggling the server-side binding feature? I mentioned digging through logs as one example where it might matter.
- Use of both cast and native json will require at least a minor change to escaping. This is because we use the double-colon operator to cast and the native json syntax uses a single colon to separate key-value pairs. This creates a parsing ambiguity which results in a syntax error (on at least one version of postgres). For solutions, they're all pretty similar
Options for minor issue 2:
- Update the
as_native
function to wrap the keys in parenthesis, effectively resolving the ambiguity. (This does raise yet another question, a question within a question: should we go ahead and wrap the keys in parenthesis on ALL backends? I think Oracle doesn't necessary require that for example.) - Update the Cast function to always wrap values in parenthesis in all contexts. This seems like overkill.
- Change postgres from using the double-colon operator to the CAST(x AS type) syntax. This also seems like overkill, and results in sql being generated that is less postgres-y, if that makes sense.
Change History (20)
comment:1 by , 2 months ago
Description: | modified (diff) |
---|
comment:2 by , 2 months ago
comment:3 by , 2 months ago
In order to help facilitate discussion, I went ahead and opened a pull request with my preferred changes: https://github.com/django/django/pull/18622
When there's consensus on the the two issues:
- Whether to cast keys on postgres 16+ without server side bindings; and
- The best way to ensure that keys are escaped to avoid the colon/double-colon parsing error
I'll update the PR and we can proceed.
comment:4 by , 2 months ago
Has patch: | set |
---|
comment:5 by , 2 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:6 by , 8 weeks ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
Accepting following the referenced PR conversation. I'm unclear about the best solution given the presented alternatives. Sarah, Simon, Mariusz: would you have a recommendation?
comment:7 by , 7 weeks ago
I'm of opinion that we should try to generate the same SQL with and without server-side bindings enabled. That should put us in a better place for eventual support for prepared statements (#20516).
As for the parenthesis wrapping I think that another option that wasn't discussed is to use a Func(expression, template="CAST(%(expressions)s AS text)")
instead of Cast
in this particular case to avoid the parsing ambiguity in as_postgres
while leaving Cast.as_postgres
alone.
comment:8 by , 7 weeks ago
Cc: | added |
---|
comment:9 by , 7 weeks ago
Good info with prepared statements. That meshes with my intuition.
Regarding Func(expression, template="CAST(%(expressions)s AS text)")
, that has a bit of code smell to me, abstraction inversion.
I sort of question the wisdom of using the ::
operation on postgres at all. The comment https://github.com/django/django/blob/6765b6adf924c1bc8792a4a454d5a788c1abc98e/django/db/models/functions/comparison.py#L52-L61 suggests it makes the SQL "more readable", which is true in the context that you're more familiar with postgres than any RDMS. I'd argue that the Cast(...) function is possibly "more readable" than the double colon in the case that the user is more familiar with other databases or just SQL in general, and *relevant to this issue* it is more readable in the case that there are colons used elsewhere in the expression, i.e. JSON_OBJECT. (It's so unreadable with the double colons that even the parser can't "read" it, haha)
I think an argument can be made to just delete those 10 lines and have the output SQL closer to standards. It seems like part of the underlying philosophy of moving postgres from JSONB_BUILD_OBJECT to JSON_OBJECT is to settle on standards, so in that sense, reworking Cast() is in line with the overall spirit of the project.
Additional Note: If you can delete ten lines of code from the codebase and all the tests still pass, that indicates to me the possibility of some kind of defect or design problem, although not necessarily an error.
comment:10 by , 7 weeks ago
Pros in favor or removing ::
in favor of CAST
- Less code
- Generated SQL on Postgres will match other RDMS output in many cases
- It's more explicit
- "Standardization"
Cons
- It's not _strictly_ required to solve this specific issue
- Users more familiar with Postgres likely would find the double-colon notation more readable in most circumstances
- Perhaps there's a goal to make the generated SQL look as if it were written by an experienced programmer writing in that specific dialect. "Idiomatic" postgres?
comment:11 by , 7 weeks ago
Another con of changing ::
at this point is that will change the SQL generated in any usage Cast
in expression indices which might cause them not to be usable anymore when users upgrade their version of Django.
For example, say that you have a model of the form
class People(models.Model): phone_number = models.IntegerField() class Meta: indexes = [ Index(name="phone_number_str", Cast("phone_number", models.TextField())) ]
Then the resulting index will be on (phone_number)::text
and queries of the form filter(phone_number__startswith="123")
will generate WHERE (phone_number)::text LIKE '123%'
will be able to use the index.
If we change Cast
to use CAST
instead of ::
I'm not entirely confident that the Postgres planer will be smart enough to use the pre-existing (phone_number)::text
index for queries of the form CAST(phone_number AS text) LIKE '123%'
.
From past experience the expression has to exactly match for the planner to consider the index as potential candidate so unless (expr)::type
and CAST(expr AS type)
result in the same AST I doubt we can safely change it without forcing users to rebuild all of their indices making use of Cast
which seems way beyond the scope of this ticket.
comment:12 by , 7 weeks ago
Darn. That's a pretty massive con, I would agree. Let's say that's a non-option then. Thank you for your insight.
So with respect to my open PR, it sounds like we should use the double-colon as well, in the (probably unlikely case) that the field in question has an index on it. Based on what you said, if we use "CAST(%(expression)s)" it might result in an index a user defined with the expectation it be used, going unused due to the text not being exactly the same in the query. Make sense?
Quick edit: As I typed this out, I realized I'm not sure when Postgres would even _use_ an index in the context of JSONObject. That seems extremely unlikely actually.
comment:13 by , 7 weeks ago
In the (probably unlikely) case that a user has created an index using JSONObject
, isn't it also possible that migrating from JSONB_BUILD_OBJECT
to JSON_OBJECT
might break those indexes in much the same way as your Cast example?
class People(models.Model): phone_number = models.IntegerField() class Meta: indexes = [ Index(name="phone_number_obj", JSONObject(phone_number="phone_number")) ]
If the migration was run a while ago, the index would use the JSONB_BUILD_OBJECT function, but now queries will use the JSON_OBJECT function.
Is it worth noting SOMEWHERE in a release note or something? It seems really unlikely to me that someone would index their data this way.
comment:14 by , 7 weeks ago
Changing to JSONB_BUILD_OBJECT
would also be affected but as you've pointed out it's unlikely and was less unlikely than usage of Cast
. I think it might be worth a release notes yes, IIRC we did it when we changed how Concat
behaved.
comment:15 by , 7 weeks ago
I was going to take a crack at adding a release note, modeled after the release note you referenced for Concat, but I'm not sure I've found the correct release note. Which one did you have in mind?
comment:16 by , 5 weeks ago
I believe this is pretty much ready to go. If we need a release note, I would appreciate a little help.
comment:17 by , 5 weeks ago
Thank you John, this is indeed in the review queue and we'll get to this as soon as we can.
comment:18 by , 3 weeks ago
Type: | New feature → Cleanup/optimization |
---|
comment:19 by , 3 weeks ago
Triage Stage: | Accepted → Ready for checkin |
---|
My general opinion/gut feeling is:
But I'm happy to be swayed.