Opened 17 months ago
Last modified 3 weeks ago
#35381 assigned New feature
Provide `JSONNull` expression to represent JSON `null` value
Reported by: | Olivier Tabone | Owned by: | Clifford Gama |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | David Sanders, Simon Charette, Mariusz Felisiak, David Wobrock, Sage Abdullah, Clifford Gama, Adam Johnson | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
From Simon Charette in comment:3, in reference to the issues between SQL's NULLL
and JSON's null
:
In order to finally solve this null ambiguity problem we should:
- Introduce a
JSONNull
expression to disambiguate between the two. It would also allow the creation of model instances with a JSONnull
which is convoluted today as it requiresmodels.Value(None, models.JSONField())
to be used. - Deprecate
filter(jsonfield=None)
meaning JSONnull
by requiringJSONNull
to be used instead. Should we only do this at the top level to still allowjsonfield__key=None
to filter against null keys? An alternative would be to introduce a__jsonnull
lookup.
The good news is that Django makes it very hard to store JSON null
in the first place since #34754 so this kind of constraints should be rarely needed.
Old description
Regression is still present in 5.0 and main branch
given a model defined as follow
from django.db import models class JSONFieldModel(models.Model): data = models.JSONField(null=True, blank=True, default=None) class Meta: required_db_features = {"supports_json_field"} constraints = [ models.CheckConstraint( check=~models.Q(data=models.Value(None, models.JSONField())), name="json_data_cant_be_json_null", ), ]
the following test works in django 4.1.13, fails in later version
from django.test import TestCase, skipUnlessDBFeature from .models import JSONFieldModel class CheckConstraintTests(TestCase): @skipUnlessDBFeature("supports_json_field") def test_full_clean_on_null_value(self): instance = JSONFieldModel.objects.create(data=None) # data = SQL Null value instance.full_clean()
tests/runtests.py json_null_tests ====================================================================== ERROR: test_full_clean_on_null_value (json_null_tests.tests.CheckConstraintTests.test_full_clean_on_null_value) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/olivier/dev/django.dev/olibook/django/django/test/testcases.py", line 1428, in skip_wrapper return test_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/olivier/dev/django.dev/olibook/django/tests/json_null_tests/tests.py", line 13, in test_full_clean_on_null_value instance.full_clean() File "/Users/olivier/dev/django.dev/olibook/django/django/db/models/base.py", line 1504, in full_clean raise ValidationError(errors) django.core.exceptions.ValidationError: {'__all__': ['Constraint “json_data_cant_be_json_null” is violated.']} ----------------------------------------------------------------------
Attached a patch that will create run json_null_tests folder in django's tests
directories. Test can be run with
tests/runtests.py json_null_tests
NOTE : in django 5.1 the CheckConstraint.check
parameter as been renamed to CheckConstraint.condition
.
I ran a git bisect on this test case:
8fcb9f1f106cf60d953d88aeaa412cc625c60029 is bad
e14d08cd894e9d91cb5d9f44ba7532c1a223f458 is good
git bisect run tests/runtests.py json_null_tests
5c23d9f0c32f166c81ecb6f3f01d5077a6084318 is identified as first bad commit
the issue appears with both sqlite and postgres backends
A few words on what we I'm trying to achieve (some context on the regression):
The json_data_cant_be_json_null
aims to prevent having JsonModel.data = None
(SQL Null) and JSONModel.data = Value(None, JSONField())
(JSON null) values in the database. These leads to unwanted behaviors when filtering on the null value, and I settled on representing the absence of value with SQL Null and not JSON null.
The app was running in django 4.1 and I'm upgrading it to later django version. That's how the issue appeared.
Attachments (1)
Change History (27)
by , 17 months ago
Attachment: | json_null_tests.patch added |
---|
comment:1 by , 17 months ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
comment:3 by , 17 months ago
I've looked into it as well and #35167 only provides a partial solution here IMO as it would require the Q.check
logic to also pass around for_save
to activate the code path it introduces.
I'm starting to believe that in order to finally solve this null ambiguity problem we should
- Introduce a
JSONNull
expression to disambigutate between the two. It would also allow the creation of model instances with a JSONnull
which is convoluated today as it requiresmodels.Value(None, models.JSONField())
to be used. - Deprecate
filter(jsonfield=None)
meaning JSONnull
by requiringJSONNull
to be used instead. Should we only do this at the top level to still allowjsonfield__key=None
to filter againstnull
keys? An alternative would be to introduce a__jsonnull
lookup.
The good news is that Django makes it very hard to store JSON null in the first place since #34754 so this kind of constraints should be rarely needed.
comment:4 by , 17 months ago
Thank you Simon. In my view, and in an ideal situation, we shouldn't allow "anywhere" to use None
as the json's null
. I think we should reserve None
only to represent SQL's NULL
, and have a representation for the (string) json's null
(I understand this matches the new JSONNull
expression you are proposing).
Should we only do this at the top level to still allow
jsonfield__key=None
to filter against null keys?
I would say no, because it then gets very confusing.
An alternative would be to introduce a
__jsonnull
lookup.
Would this lookup allow further lookups to be chained?
comment:5 by , 17 months ago
I think we should reserve None only to represent SQL's NULL, and have a representation for the (string) json's null (I understand this matches the new JSONNull expression you are proposing).
Yes it matches the JSONNull
proposal. If we take away the ability to use None
for filtering against JSON null
we should provide an alternative.
The deprecation could look like
- Introduce
JSONNull
- Make
filter(jsonfield=None)
raise a deprecation warning pointing at either usingfilter(jsonfield__isnull=True)
orfilter(jsonfield=JSONNull)
- At the end of the deprecation period switch
filter(jsonfield=None)
to meanfilter(jsonfield__isnull=True)
like on all other fields
It leaves the problem of having JSON null
not surviving a round trip to the database as both SQL NULL
and json.loads("null")
are turned into Python None
but that's a different issue that can be addressed with a specialized decoder
if users require it.
Would this lookup allow further lookups to be chained?
It could or not, what gets returned from a lookup can be prevent further chaining if we want.
follow-up: 7 comment:6 by , 13 months ago
Hello, if you're okay with it, I would like to start working on this ticket and submit a PR to address the issue.
comment:7 by , 13 months ago
Cc: | added |
---|
Replying to Mohammad Salehi:
Hello, if you're okay with it, I would like to start working on this ticket and submit a PR to address the issue.
Of course! Feel free to assign the ticket to you and open a patch.
Bear in mind the suggested approach from the discussions above :)
comment:8 by , 12 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 5 months ago
Owner: | changed from | to
---|
comment:10 by , 4 months ago
Cc: | added |
---|
In the context of investigating #36418, I have re-reviewed the different tickets that we have related to confusing behaviours of SQL's NULL
vs JSON's string null
, and I think this ticket should be edited in summary and description to match the New Feature described in comment:5 (or alternatively, close this one as invalid and open a new ticket).
comment:11 by , 4 months ago
Cc: | added |
---|
comment:12 by , 3 months ago
Cc: | added |
---|
comment:13 by , 3 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:14 by , 3 months ago
Description: | modified (diff) |
---|---|
Summary: | Regression on json null value constraints in django 4.2 → Provide `JSONNull` expression to represent JSON `null` value |
Type: | Bug → New feature |
Version: | 5.0 → dev |
comment:15 by , 2 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:16 by , 2 months ago
#36508 was a dupe. We'll want to highlight this deprecation prominently in the release notes, since unlike top-level JSON nulls, JSON nulls in key/value pairs can be expected to appear more frequently.
comment:17 by , 2 months ago
Another edge case of the dual interpretation of None
when storing and querying JSONField (SQL NULL
on persisting and filter(json_field=None)
on filtering) that came up at work today is how it breaks get_or_create
expectations that the filter and creation value will match.
It means that doing
Foo.objects.get_or_create(bar=123, json_field=None)
will perform
SELECT * FROM foo WHERE bar = 123 AND json_field = 'null'::jsonb INSERT INTO foo (bar, json_field) VALUES (123, NULL)
using JSONNull()
(or it's closet equivalent today being Value(None, JSONField())
) would allows for JSON null
to be used in both cases (querying and storing) but there are no ways to specify that SQL NULL
must be used in both cases which is a supporting point for deprecating filter(json_field=None)
meaning JSON null
. Here is an attempt at getting it IS NULL
to be used for comparison and persistence.
follow-up: 24 comment:18 by , 3 weeks ago
I’d like clarification on whether we also want to deprecate filter(data__key=None)
.
As I understand it, the ambiguity motivating deprecation at the top level does not exist inside a JSON document. A key’s value can be JSON null
, but it cannot be SQL NULL
. For that reason, I think .filter(data__key=None)
should continue to match JSON null
.
We could still allow .filter(data__key=JSONNull())
for users who want explicitness, without deprecating None
usage in this case, especially as Jacob pointed out, this case is more common. Additionally, on the creation side, supporting nested JSON null like {"key": JSONNull()} in inserts/updates would require a custom encoder, which isn’t the case for JSONNull()
at the top level. Since users can already override encoders/decoders, we can’t make that work universally. We could bypass this problem by introducing and promoting (in the docs) the use of __is_jsonnull
lookup for querying and JSONNull() for creating top-level JSON null
.
If we don’t deprecate using None
in KeyTransform
lookups to mean JSON null, then ticket #36508 would have to be reopened, as it is surprising that (jsonfield__key__iexact=None)
is translated to an __isnull
lookup when jsonfield__key=None
means "does key have a value of JSON null
."
comment:19 by , 3 weeks ago
A wrinkle is that __exact=None
is pretty clearly documented to be synonymous with __isnull=True
, so by not deprecating we'd be choosing to memorialize something I found as a user to be a "gotcha".
Additionally, on the creation side, supporting nested JSON null like {"key": JSONNull()} in inserts/updates would require a custom encoder, which isn’t the case for JSONNull() at the top level.
Would you mind fleshing this out slightly, to make sure I'm catching your drift?
comment:20 by , 3 weeks ago
A wrinkle is that
__exact=None
is pretty clearly documented to be synonymous with__isnull=True
, so by not deprecating we'd be choosing to memorialize something I found as a user to be a "gotcha".
Makes sense.
Would you mind fleshing this out slightly, to make sure I'm catching your drift?
I meant that doing Dog(data={"name": JSONNull()}).save()}
would probably require us to build a custom encoder since json.dumps()
will not recognise the JSONNull()
expression.
comment:21 by , 3 weeks ago
I meant that doing Dog(data={"name": JSONNull()}).save()} would probably require us to build a custom encoder since json.dumps() will not recognise the JSONNull() expression.
Thanks. Simon discussed this concern for top-level JSON null
in comment:5:
It leaves the problem of having JSON null not surviving a round trip to the database as both SQL NULL and json.loads("null") are turned into Python None but that's a different issue that can be addressed with a specialized decoder if users require it.
... which I think is an acceptable trade-off for top-level JSON null.
For nulls in key/value pairs, as in your example, I think it's still an acceptable trade-off, as long as we are not *removing* the ability of users to store python None
in key/value pairs. The plan I see from the above comments is just to deprecate & change the behavior of filtering on None
in key/value pairs. (EDIT: not really, see comment:22 and comment:23 for updates.)
comment:22 by , 3 weeks ago
I share your position Clifford and Jacob.
To me the ambiguity of what to do with None
only exists for top level values of JSONField(null=True)
; there's no ambiguity for key values as it always mean JSON null
.
To make it clear I also think that filter(data__key=None)
should continue to match JSON null
and I'm not opposed to supporting filter(data__key=JSONNull())
while I don't see much benefits to it.
comment:23 by , 3 weeks ago
Oh this clarifies things for me. I'll edit my recent comments to avoid suggesting there was consensus to change the behavior of filtering on None
in key values, since it sounds like that was only proposed in comment:4, and later as Clifford points out, in the triage decision on #36508, but now we have multiple people with misgivings about doing that.
Having just left a project that managed most of its data in JSONFields, I can offer that auditing the behavior of key=var lookups everywhere to figure out what's intended when var
is possibly None
would be a daunting expense.
Re: the "gotcha" of exact=None
and isnull=True
not being invariant, we can just doc it clearly.
comment:24 by , 3 weeks ago
Thanks all, I was just about to post what Simon said.
Replying to Clifford Gama:
If we don’t deprecate using
None
inKeyTransform
lookups to mean JSON null, then ticket #36508 would have to be reopened, as it is surprising that(jsonfield__key__iexact=None)
is translated to an__isnull
lookup whenjsonfield__key=None
means "does key have a value of JSONnull
."
I agree, that ticket is a separate issue and I think it should be reopened. I'll leave a comment there for more details.
comment:26 by , 3 weeks ago
Has patch: | set |
---|
I think this is related to #34754, and I originally thought that this was another consequence of what Simon said there: I think the fundamental problem here is that:
but then I ran the proposed test with
--debug-sql
and noticed that the check code is casting the value in the database to::JSONB
independently of whether the value isNone
or not:So I kept digging and bisected the "bad" commit to 5c23d9f0c32f166c81ecb6f3f01d5077a6084318, and something like this makes the test pass:
django/db/models/fields/json.py
hasattr(value, "as_sql"):it makes the test
model_fields.test_jsonfield.TestSaveLoad.test_json_null_different_from_sql_null
fails for the check on the query:So further investigation is needed to come up with an acceptable solution.