Opened 6 years ago

Closed 2 years ago

#29482 closed Cleanup/optimization (wontfix)

simplify KeyTransform to always use the 'nested_operator'

Reported by: David De Sousa Owned by: nobody
Component: contrib.postgres Version: 2.0
Severity: Normal Keywords:
Cc: James Howe Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, the KeyTransform and KeyTextTransform classes in the JSONB Field lookups are making a distinction in the way the query is being generated based on the depth of the key inside the JSON, using a different operator depending on if the key is in the root of the json or not.

This is an issue specially if you create indexes based on jsonb traversals, for some reason if you create an index with the path operators (#> or #>>) and then query with the "regular" operators (-> or ->>) the indexes are not always used.

I've tested removing the use of the operator and basically just leaving the return in https://github.com/django/django/blob/2.0.6/django/contrib/postgres/fields/jsonb.py#L105 as the only return in the function and it works for me, I'd be happy to create a PR but I don't know if that distinction is there for a reason.

Change History (9)

comment:1 by Simon Charette, 6 years ago

From my understanding the -> and ->> operators are used instead of their # variant when the key lookup depth is 1 for readability purposes.

Is there a reason you can't create your non-nested attribute lookups using the -> and ->> operators instead? Are you suggesting we favor the # syntax because a functional index on data#>'{a,b}' would be used when doing a data#>'{a}' lookup?

I'm asking because changing it at this point is likely to break any previously created functional index created using these operators for the same reason you are asking them to be changed here. I assume this is something that would eventually be fixed in PostgreSQL.

comment:2 by Matthew Schinckel, 6 years ago

Does it need something to do with array elements vs object keys? Or does the postgres nested operator already handle that?

If you had a top-level (and non-nested) JSON array in a field, you'd probably want to stick to a simpler index/operator, I think.

I haven't looked into this, it's just thoughts.

in reply to:  1 comment:3 by David De Sousa, 6 years ago

Replying to Simon Charette:

From my understanding the -> and ->> operators are used instead of their # variant when the key lookup depth is 1 for readability purposes.

Is there a reason you can't create your non-nested attribute lookups using the -> and ->> operators instead? Are you suggesting we favor the # syntax because a functional index on data#>'{a,b}' would be used when doing a data#>'{a}' lookup?

No, that won't happen AFAIK

I'm asking because changing it at this point is likely to break any previously created functional index created using these operators for the same reason you are asking them to be changed here. I assume this is something that would eventually be fixed in PostgreSQL.

It would be awesome if PostgreSQL dealt with this, but I asked in the #postgres IRC and was told there's no way for the query planner to do this, although I got confirmation the # and - operators are equivalent.

No, the issue is that django uses both operators based on if it's a root key or a nested one, I agree a change here would potentially break previously created indexes using the -> or --> operators, so developers should take this into consideration when building indexes, it should at least be documented, I found out the hard way.

in reply to:  2 comment:4 by David De Sousa, 6 years ago

Replying to Matthew Schinckel:

Does it need something to do with array elements vs object keys? Or does the postgres nested operator already handle that?

#>> is a path operator, so it will work also with array indexes, #>> '{2}' would give you the 3rd element of the root array.

If you had a top-level (and non-nested) JSON array in a field, you'd probably want to stick to a simpler index/operator, I think.

As I said before I could agree with this, although I'd vouch for simplicity in the code, using only the path operator makes the code a lot simpler IMHO

comment:5 by James Howe, 6 years ago

The suggested change doesn't solve the problem, it just creates a new one.
The problem is that an index will only be used if the JSON operators are the same in both the index definition and the query.
The four path traversal operators (->, ->>, #>, #>>) are all incompatible.

Standardizing on a single operator for all queries would make things easier to deal with, but people may still have indices they are unable to use.
Allowing the operator to be controlled may be a quick workaround.
Ideally, the Index classes should support JSON paths (and partial indexes), so the definition can be standardized to the same operators.

Last edited 6 years ago by James Howe (previous) (diff)

comment:6 by James Howe, 6 years ago

Cc: James Howe added

comment:7 by David De Sousa, 6 years ago

I agree, the ideal solution would be to support JSON paths and partial indexes in the Index class, just be aware that -> and ->> return different types, so instead of using the four operators, django could use two, either -> and ->> or #> and #>>, depending on the user's desire.

Although if support is added to the Index class, this discussion is irrelevant since the index can be created with the same logic the KeyTransform classes use, using the -> and ->> operators when dealing with root-level keys and the #> and #>> operators when traversing, this is what I'm doing in a management command that handles the index creation for my use case after I found out about this behavior.

comment:8 by Tim Graham, 6 years ago

Component: Uncategorizedcontrib.postgres
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

I'm not sure what the way forward is, but it sounds like Django can do something, whether it's a behavior change or some documentation.

in reply to:  7 comment:9 by Mariusz Felisiak, 2 years ago

Resolution: wontfix
Status: newclosed
Triage Stage: AcceptedUnreviewed

Replying to David De Sousa:

Although if support is added to the Index class, this discussion is irrelevant since the index can be created with the same logic the KeyTransform classes use, using the -> and ->> operators when dealing with root-level keys and the #> and #>> operators when traversing, this is what I'm doing in a management command that handles the index creation for my use case after I found out about this behavior.

Index() supports creating functional and partial indexes based on JSONField key transforms (see tests), so as far as I'm aware this can be closed.

Closing as "wontfix", because we will not change the current operators.

Note: See TracTickets for help on using tickets.
Back to Top