Opened 4 months ago

Last modified 4 months ago

#29482 new Cleanup/optimization

simplify KeyTransform to always use the 'nested_operator'

Reported by: David De Sousa Owned by: nobody
Component: contrib.postgres Version: 2.0
Severity: Normal Keywords:
Cc: James Howe Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, the KeyTransform and KeyTextTransform classes in the JSONB Field lookups are making a distinction in the way the query is being generated based on the depth of the key inside the JSON, using a different operator depending on if the key is in the root of the json or not.

This is an issue specially if you create indexes based on jsonb traversals, for some reason if you create an index with the path operators (#> or #>>) and then query with the "regular" operators (-> or ->>) the indexes are not always used.

I've tested removing the use of the operator and basically just leaving the return in https://github.com/django/django/blob/2.0.6/django/contrib/postgres/fields/jsonb.py#L105 as the only return in the function and it works for me, I'd be happy to create a PR but I don't know if that distinction is there for a reason.

Change History (8)

comment:1 Changed 4 months ago by Simon Charette

From my understanding the -> and ->> operators are used instead of their # variant when the key lookup depth is 1 for readability purposes.

Is there a reason you can't create your non-nested attribute lookups using the -> and ->> operators instead? Are you suggesting we favor the # syntax because a functional index on data#>'{a,b}' would be used when doing a data#>'{a}' lookup?

I'm asking because changing it at this point is likely to break any previously created functional index created using these operators for the same reason you are asking them to be changed here. I assume this is something that would eventually be fixed in PostgreSQL.

comment:2 Changed 4 months ago by Matthew Schinckel

Does it need something to do with array elements vs object keys? Or does the postgres nested operator already handle that?

If you had a top-level (and non-nested) JSON array in a field, you'd probably want to stick to a simpler index/operator, I think.

I haven't looked into this, it's just thoughts.

comment:3 in reply to:  1 Changed 4 months ago by David De Sousa

Replying to Simon Charette:

From my understanding the -> and ->> operators are used instead of their # variant when the key lookup depth is 1 for readability purposes.

Is there a reason you can't create your non-nested attribute lookups using the -> and ->> operators instead? Are you suggesting we favor the # syntax because a functional index on data#>'{a,b}' would be used when doing a data#>'{a}' lookup?

No, that won't happen AFAIK

I'm asking because changing it at this point is likely to break any previously created functional index created using these operators for the same reason you are asking them to be changed here. I assume this is something that would eventually be fixed in PostgreSQL.

It would be awesome if PostgreSQL dealt with this, but I asked in the #postgres IRC and was told there's no way for the query planner to do this, although I got confirmation the # and - operators are equivalent.

No, the issue is that django uses both operators based on if it's a root key or a nested one, I agree a change here would potentially break previously created indexes using the -> or --> operators, so developers should take this into consideration when building indexes, it should at least be documented, I found out the hard way.

comment:4 in reply to:  2 Changed 4 months ago by David De Sousa

Replying to Matthew Schinckel:

Does it need something to do with array elements vs object keys? Or does the postgres nested operator already handle that?

#>> is a path operator, so it will work also with array indexes, #>> '{2}' would give you the 3rd element of the root array.

If you had a top-level (and non-nested) JSON array in a field, you'd probably want to stick to a simpler index/operator, I think.

As I said before I could agree with this, although I'd vouch for simplicity in the code, using only the path operator makes the code a lot simpler IMHO

comment:5 Changed 4 months ago by James Howe

The suggested change doesn't solve the problem, it just creates a new one.
The problem is that an index will only be used if the JSON operators are the same in both the index definition and the query.
The four path traversal operators (->, ->>, #>, #>>) are all incompatible.

Standardizing on a single operator for all queries would make things easier to deal with, but people may still have indices they are unable to use.
Allowing the operator to be controlled may be a quick workaround.
Ideally, the Index classes should support JSON paths (and partial indexes), so the definition can be standardized to the same operators.

Last edited 4 months ago by James Howe (previous) (diff)

comment:6 Changed 4 months ago by James Howe

Cc: James Howe added

comment:7 Changed 4 months ago by David De Sousa

I agree, the ideal solution would be to support JSON paths and partial indexes in the Index class, just be aware that -> and ->> return different types, so instead of using the four operators, django could use two, either -> and ->> or #> and #>>, depending on the user's desire.

Although if support is added to the Index class, this discussion is irrelevant since the index can be created with the same logic the KeyTransform classes use, using the -> and ->> operators when dealing with root-level keys and the #> and #>> operators when traversing, this is what I'm doing in a management command that handles the index creation for my use case after I found out about this behavior.

comment:8 Changed 4 months ago by Tim Graham

Component: Uncategorizedcontrib.postgres
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

I'm not sure what the way forward is, but it sounds like Django can do something, whether it's a behavior change or some documentation.

Note: See TracTickets for help on using tickets.
Back to Top