Opened 3 months ago

Last modified 6 days ago

#36494 assigned Cleanup/optimization

Various failures in JSONField lookups when using expressions in right-hand side

Reported by: Jacob Walls Owned by: Jacob Walls
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Simon Charette, Sage Abdullah, Mariusz Felisiak Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jacob Walls)

After #35972, I worked on a test to verify support for JSONField lookups against right-hand side subqueries. Support is mostly complete on Postgres, but largely incomplete on other backends.

I'm suggesting we close this ticket (in separate PRs) with a combination of adding support where possible (perhaps for __range?), with failing at the python layer with more appropriate exceptions, or possibly adding a feature flag to guard some failures. I agree with the sentiment that we cannot maintain endless feature flags for database quirks or insulate users from every bug in their database vendor, but some of these failures seem to me to be on the Django side.

Postgres:

  1. __range (source):
    django.db.utils.ProgrammingError: the query has 2 placeholders but 3 parameters were passed
    
  2. __has_keys or __has_any_keys:
      File "/django/source/django/db/models/fields/json.py", line 278, in get_prep_lookup
        return [str(item) for item in self.rhs]
                                      ^^^^^^^^
    TypeError: 'Subquery' object is not iterable
    

SQLite & MySQL:

  1. __range: same as Postgres
  2. __has_keys or __has_any_keys: same as Postgres
  3. __has_key:
      File "/django/source/django/db/models/fields/json.py", line 186, in compile_json_path_final_key
        return ".%s" % json.dumps(key_transform)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: Object of type Subquery is not JSON serializable
    
  4. __gt, __gte, __lt, __lte:
      File "/django/source/django/db/models/fields/json.py", line 610, in process_rhs
        rhs_params = [json.loads(value) for value in rhs_params]
                      ^^^^^^^^^^^^^^^^^
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    
  5. __exact:
    sqlite3.OperationalError: malformed JSON
    

Additional failure on MySQL:

  1. __in:
    django.db.utils.NotSupportedError: (1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
    

Oracle:

  1. __range: same as others (database exception)
  2. __has_keys or __has_any_keys: same as others
  3. __has_key: same as SQLite & MySQL
  4. Certain text lookups (__icontains, __startswith, __istartswith, __endswith, __iendswith): return wrong results, they appear to return any results having the key on the left-hand side regardless of the contents of the right-hand side, refer to the modifications commented out in the linked test.

This is all pretty niche, but if the context helps, I could envision realistic use cases for __has_keys=queryset, as I work on a project that uses stringified UUIDs identifying other entities as JSON keys.

Change History (4)

comment:1 by Natalia Bidart, 3 months ago

Cc: Simon Charette Sage Abdullah Mariusz Felisiak added
Resolution: wontfix
Status: newclosed

Hello Jacob, thank you for taking the time to create this report and test all DB backends.
I can't find evidence in our docs nor in the code base that this is a supported feature, to me this qualifies as a new feature request, and while I agree it feels "natural", I fear this should go thru the new feature request process.

I'm adding others as cc since I'm happy to be overruled on this one to be Accepted instead wontfix.

comment:2 by Jacob Walls, 6 days ago

Description: modified (diff)
Resolution: wontfix
Status: closednew
Summary: Various failures for JSONField lookups and right-hand side subqueriesVarious failures in JSONField lookups when using expressions in right-hand side

Simon suggested bracketing Subquery for the moment and just focusing on expression support generally.

With just F()...

    def test_lookups_using_expression(self):
        contains_lookups = {"contains", "contained_by"}
        lookups_not_matching = {"gt", "lt", "has_key"}
        text_lookups = {
            "has_key",
            "iexact",
            "icontains",
            "startswith",
            "istartswith",
            "endswith",
            "iendswith",
            "regex",
            "iregex",
        }
        for lookup in JSONField.get_lookups():
            with self.subTest(lookup=lookup):
                if lookup == "isnull":
                    continue  # not allowed, rhs must be a literal boolean.
                if (
                    lookup in contains_lookups
                    and not connection.features.supports_json_field_contains
                ):
                    continue
                if lookup in text_lookups:
                    rhs = KT("value__bar")
                else:
                    rhs = F("value__bar")
                qs = NullableJSONModel.objects.filter(**{f"value__bar__{lookup}": rhs})
                if lookup in lookups_not_matching:
                    self.assertIs(qs.exists(), False)
                else:
                    # might need tweaking
                    self.assertQuerySetEqual(qs, NullableJSONModel.objects.filter(value__has_key="bar"))

... we see three classes of failures:

  1. gt/gte/lt/lte
  File "/Users/jwalls/django/django/db/models/fields/json.py", line 614, in process_rhs
    rhs_params = [json.loads(value) for value in rhs_params]
                  ~~~~~~~~~~^^^^^^^
  File "/Users/jwalls/cpython/Lib/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/Users/jwalls/cpython/Lib/json/decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jwalls/cpython/Lib/json/decoder.py", line 363, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
  1. range
    django.db.utils.OperationalError: near "AND": syntax error
    
  1. has_keys/has_any_keys
      File "/Users/jwalls/django/django/db/models/fields/json.py", line 278, in get_prep_lookup
        return [str(item) for item in self.rhs]
                                      ^^^^^^^^
    TypeError: 'KeyTransform' object is not iterable
    

I'm suggesting we close this ticket (in separate PRs) with a combination of adding support where possible (perhaps for range?), with failing at the python layer with more appropriate exceptions,

I think we should invite PRs per lookup to either raise a better exception (case 2 above is especially poor, sending malformed SQL) or add support if the effort is not massive. (At Django on the Med 2025 I made a small experiment with adding has_any_keys support on PostgreSQL: seemed slightly promising?)

All of the commits would probably be "Refs ...", and then I imagine closing this ticket manually once we decide the remaining edge cases are too difficult to implement.

Going through the new features features process *could* help ensure that "reviewer resources" are equitably competed for in this case, but I can't think of many other reasons to go through that process. I think in the absence of any glaring warning in the docs, my expectation is that all of JSONField's registered lookups work with expressions. I doubt we would add a release note.

comment:3 by Jacob Walls, 6 days ago

Owner: set to Jacob Walls
Status: newassigned

comment:4 by Natalia Bidart, 6 days ago

Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

Thank you for the extra details, accepting on the basis to "invite PRs per lookup to either raise a better exception or add support if the effort is not massive".

Note: See TracTickets for help on using tickets.
Back to Top