Opened 5 years ago

Last modified 2 weeks ago

#32213 assigned Bug

QuerySet.values()/.values_list() on KeyTransforms return wrong values for double-quoted strings on SQLite and Oracle

Reported by: Sage Abdullah Owned by: VIZZARD-X
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: sqlite oracle json
Cc: Carlton Gibson, bcail Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Suppose we have a JSONField with the value {'key': '"value"'}. If .values()/.values_list() are called on a jsonfield__key lookup, they return 'value' on SQLite and Oracle. On other backends, they return '"value"', which is the correct behavior.

Change History (13)

comment:1 by Mariusz Felisiak, 5 years ago

Cc: Carlton Gibson added
Keywords: sqlite oracle json added
Severity: Release blockerNormal
Summary: QuerySet .values()/.values_list() on KeyTransforms return wrong values for double-quoted strings on SQLite and OracleQuerySet.values()/.values_list() on KeyTransforms return wrong values for double-quoted strings on SQLite and Oracle
Triage Stage: UnreviewedAccepted

Thanks, I'm not sure about severity because it's niche I don't believe that many users (if anyone) are affected by this issue. We can bump it when evaluating a patch.

comment:2 by Mariusz Felisiak, 4 years ago

Owner: Sage Abdullah removed
Status: assignednew

comment:3 by Simon Charette, 3 years ago

I'm also not sure this is event fixable since JSONField is schema less so the ORM has not way to know it should use ->> over -> for the specified key.

I think this ticket is a good candidate for wontfix consideration once KeyTextTransform is documented (#15956) as this issue can be solved by doing values_list(KeyTextTransform("key", "json_field")).

comment:4 by bcail, 3 years ago

Cc: bcail added

I'm working on a test for this case, and having trouble reproducing it.

diff --git a/tests/model_fields/test_jsonfield.py b/tests/model_fields/test_jsonfield.py
index 2c32d8a4ea..ecc2b9d0d5 100644
--- a/tests/model_fields/test_jsonfield.py
+++ b/tests/model_fields/test_jsonfield.py
@@ -611,6 +611,11 @@ class TestQuerying(TestCase):
                     [obj],
                 )
 
+    def test_value(self):
+        obj = NullableJSONModel.objects.create(value={"key": '"value"'})
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='"value"').values_list("value", flat=True), [{'key': '"value"'}])
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(Q(value__has_key="key")).values_list("value", flat=True), [{'key': '"value"'}])
+
     @skipUnlessDBFeature("supports_json_field_contains")
     def test_contains(self):
         tests = [

This test passes on both SQLite and Postgresql for me. What do I need to change to trigger the wrong behavior?

in reply to:  4 comment:5 by Mariusz Felisiak, 3 years ago

Replying to bcail:

This test passes on both SQLite and Postgresql for me. What do I need to change to trigger the wrong behavior?

This ticket is about using key transforms in the values()/values_list():

# SQLite
>>> NullableJSONModel.objects.filter(value__key='"value"').values_list("value__key", flat=True)
['value']
# PostgreSQL
>>> NullableJSONModel.objects.filter(value__key='"value"').values_list("value__key", flat=True)
['"value"']

comment:6 by bcail, 3 years ago

Thanks! That works for me, to show the issue.

comment:7 by bcail, 3 years ago

Here are some tests that show the issues with various strings in a JSONField for sqlite (they pass on postgresql):

+    def test_str_with_quotes(self):
+        value = {"key": "test_value", "key2": '"value"'}
+        obj = NullableJSONModel.objects.create(value=value)
+
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values("value__key2"), [{"value__key2": '"value"'}])
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values_list("value__key2"), [('"value"',)])
+
+    def test_str_with_null_false_true(self):
+        value = {"key": "test_value", "key2": "null", "key3": "false", "key4": "true"}
+        obj = NullableJSONModel.objects.create(value=value)
+
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values_list("value__key2"), [('null',)])
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values("value__key3"), [{"value__key3": 'false'}])
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values("value__key4"), [{"value__key4": 'true'}])
+
+    def test_str_with_list_dict(self):
+        value = {"key": "test_value", "key2": "[]", "key3": "{}"}
+        obj = NullableJSONModel.objects.create(value=value)
+
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values_list("value__key2"), [("[]",)])
+        self.assertSequenceEqual(NullableJSONModel.objects.filter(value__key='test_value').values("value__key3"), [{"value__key3": "{}"}])

This code change gets the str_with_quotes test passing, but not the other two:

diff --git a/django/db/models/fields/json.py b/django/db/models/fields/json.py
index 7296fe42bc..6277ce9935 100644
--- a/django/db/models/fields/json.py
+++ b/django/db/models/fields/json.py
@@ -82,8 +82,11 @@ class JSONField(CheckFieldDefaultMixin, Field):
             return value
         # Some backends (SQLite at least) extract non-string values in their
         # SQL datatypes.
-        if isinstance(expression, KeyTransform) and not isinstance(value, str):
-            return value
+        if connection.vendor == "sqlite" and isinstance(expression, KeyTransform):
+            if not isinstance(value, str):
+                return value
+            if value not in ['true', 'false', 'null'] and not value.startswith('[') and not value.startswith('{'):
+                return value
         try:
             return json.loads(value, cls=self.decoder)
         except json.JSONDecodeError:

I think that once sqlite 3.38.0 is in widespread use, and ticket #33548 is implemented (using -> and ->> in sqlite), that may take care of this issue.

comment:8 by VIZZARD-X, 5 weeks ago

Has patch: set

comment:9 by VIZZARD-X, 5 weeks ago

I have submitted a Pull Request with the fix and regression tests:
https://github.com/django/django/pull/20487

The fix addresses the issue where SQLite KeyTransform lookups returned unquoted raw strings, causing incorrect json.loads parsing.

Last edited 5 weeks ago by VIZZARD-X (previous) (diff)

comment:10 by JaeHyuckSa, 5 weeks ago

Owner: set to VIZZARD-X
Status: newassigned

comment:11 by Simon Charette, 5 weeks ago

Needs tests: set
Patch needs improvement: set

Left some comments for improvements on the PR mainly

  1. The solution should use KeyTransform.select_format and not override JSONField.from_db_value with KeyTransform specific logic
  2. Existing testing setup (models, test classes) should be used instead of a per-ticket model and test module and test case class

comment:12 by VIZZARD-X, 3 weeks ago

I've updated the PR to use KeyTransform.select_format as suggested, ensuring the fix is isolated to the SQLite backend without modifying JSONField.from_db_value.
Also added regression tests in tests/model_fields/test_jsonfield.py rather than using per-ticket model which was the case previously

Last edited 3 weeks ago by VIZZARD-X (previous) (diff)

comment:13 by VIZZARD-X, 2 weeks ago

Needs tests: unset
Patch needs improvement: unset

I've focused this patch specifically on the SQLite regression. Unchecking Needs tests and Patch needs improvement as the SQLite resolution is looked into. The Oracle aspect mentioned in the ticket likely requires a different implementation and wondering whether it could be addressed separately in a separate ticket?

Note: See TracTickets for help on using tickets.
Back to Top