#34840 closed Bug (fixed)
Django 4.2 casts text fields when testing IS NULL, preventing use of partial indexes
Reported by: | Alex Vandiver | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Release blocker | Keywords: | |
Cc: | Florian Apolloner, Mariusz Felisiak, Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The Zulip project has a model with unique constraints which are expressed as two non-overlapping partial indexes:
class UserCount(models.Model): user = models.ForeignKey(UserProfile, on_delete=models.CASCADE) realm = models.ForeignKey(Realm, on_delete=models.CASCADE) property = models.CharField(max_length=32) subgroup = models.CharField(max_length=16, null=True) end_time = models.DateTimeField() value = models.BigIntegerField() class Meta: constraints = [ UniqueConstraint( fields=["user", "property", "subgroup", "end_time"], condition=Q(subgroup__isnull=False), name="unique_user_count", ), UniqueConstraint( fields=["user", "property", "end_time"], condition=Q(subgroup__isnull=True), name="unique_user_count_null_subgroup", ), ]
However, since commit 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca in Django 4.2, a query of the form:
UserCount.objects.get( property="messages_read::hour", subgroup=None, end_time=timezone_now(), user_id=user_profile.id, realm_id=realm.id, )
...generates this SQL:
SELECT "analytics_usercount"."id", "analytics_usercount"."property", "analytics_usercount"."subgroup", "analytics_usercount"."end_time", "analytics_usercount"."value", "analytics_usercount"."user_id", "analytics_usercount"."realm_id" FROM "analytics_usercount" WHERE ("analytics_usercount"."end_time" = '2023-09-13T19:16:34.195355+00:00'::timestamptz AND "analytics_usercount"."property" = 'messages_read::hour' AND "analytics_usercount"."realm_id" = 4715 AND "analytics_usercount"."subgroup"::text IS NULL AND "analytics_usercount"."user_id" = 428054) LIMIT 21
The cast of "analytics_usercount"."subgroup"::text IS NULL
causes PostgreSQL to not be able to use the unique partial index:
Limit (cost=48.30..49.42 rows=1 width=61) -> Bitmap Heap Scan on analytics_usercount (cost=48.30..49.42 rows=1 width=61) Recheck Cond: (((property)::text = 'messages_read::hour'::text) AND (realm_id = 4715) AND (end_time = '2023-09-13 19:00:00+00'::timestamp with time zone) AND (user_id = 428054)) Filter: ((subgroup)::text IS NULL) -> BitmapAnd (cost=48.30..48.30 rows=1 width=0) -> Bitmap Index Scan on analytics_usercount_property_591dbec1_idx (cost=0.00..4.88 rows=158 width=0) Index Cond: (((property)::text = 'messages_read::hour'::text) AND (realm_id = 4715) AND (end_time = '2023-09-13 19:00:00+00'::timestamp with time zone)) -> Bitmap Index Scan on analytics_usercount_e8701ad4 (cost=0.00..43.17 rows=3626 width=0) Index Cond: (user_id = 428054)
Dropping the explicit cast causes it to use the index:
Limit (cost=0.57..2.80 rows=1 width=61) -> Index Scan using unique_user_count_null_subgroup on analytics_usercount (cost=0.57..2.80 rows=1 width=61) Index Cond: ((user_id = 428054) AND ((property)::text = 'messages_read::hour'::text) AND (end_time = '2023-09-13 19:00:00+00'::timestamp with time zone)) Filter: (realm_id = 4715)
...and improving the query runtime significantly.
It's not clear to me from 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca what about psycopg3 requires this cast.
Change History (31)
follow-up: 2 comment:1 by , 15 months ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
comment:2 by , 14 months ago
Yes, the indexes as created on 4.2 do have the cast in them. The diff of our schema across the 4.2 upgrade is:
$ diff -U0 <(rg -U '(?s)CREATE[^;]*;' before.sql) <(rg -U '(?s)CREATE[^;]*;' after.sql) --- /dev/fd/63 2023-09-13 12:55:09.595790167 -0700 +++ /dev/fd/62 2023-09-13 12:55:09.595790167 -0700 @@ -1095,4 +1095,4 @@ -CREATE UNIQUE INDEX unique_installation_count ON zulip.analytics_installationcount USING btree (property, subgroup, end_time) WHERE (subgroup IS NOT NULL); -CREATE UNIQUE INDEX unique_installation_count_null_subgroup ON zulip.analytics_installationcount USING btree (property, end_time) WHERE (subgroup IS NULL); -CREATE UNIQUE INDEX unique_realm_count ON zulip.analytics_realmcount USING btree (realm_id, property, subgroup, end_time) WHERE (subgroup IS NOT NULL); -CREATE UNIQUE INDEX unique_realm_count_null_subgroup ON zulip.analytics_realmcount USING btree (realm_id, property, end_time) WHERE (subgroup IS NULL); +CREATE UNIQUE INDEX unique_installation_count ON zulip.analytics_installationcount USING btree (property, subgroup, end_time) WHERE ((subgroup)::text IS NOT NULL); +CREATE UNIQUE INDEX unique_installation_count_null_subgroup ON zulip.analytics_installationcount USING btree (property, end_time) WHERE ((subgroup)::text IS NULL); +CREATE UNIQUE INDEX unique_realm_count ON zulip.analytics_realmcount USING btree (realm_id, property, subgroup, end_time) WHERE ((subgroup)::text IS NOT NULL); +CREATE UNIQUE INDEX unique_realm_count_null_subgroup ON zulip.analytics_realmcount USING btree (realm_id, property, end_time) WHERE ((subgroup)::text IS NULL); @@ -1100,4 +1100,4 @@ -CREATE UNIQUE INDEX unique_stream_count ON zulip.analytics_streamcount USING btree (stream_id, property, subgroup, end_time) WHERE (subgroup IS NOT NULL); -CREATE UNIQUE INDEX unique_stream_count_null_subgroup ON zulip.analytics_streamcount USING btree (stream_id, property, end_time) WHERE (subgroup IS NULL); -CREATE UNIQUE INDEX unique_user_count ON zulip.analytics_usercount USING btree (user_id, property, subgroup, end_time) WHERE (subgroup IS NOT NULL); -CREATE UNIQUE INDEX unique_user_count_null_subgroup ON zulip.analytics_usercount USING btree (user_id, property, end_time) WHERE (subgroup IS NULL); +CREATE UNIQUE INDEX unique_stream_count ON zulip.analytics_streamcount USING btree (stream_id, property, subgroup, end_time) WHERE ((subgroup)::text IS NOT NULL); +CREATE UNIQUE INDEX unique_stream_count_null_subgroup ON zulip.analytics_streamcount USING btree (stream_id, property, end_time) WHERE ((subgroup)::text IS NULL); +CREATE UNIQUE INDEX unique_user_count ON zulip.analytics_usercount USING btree (user_id, property, subgroup, end_time) WHERE ((subgroup)::text IS NOT NULL); +CREATE UNIQUE INDEX unique_user_count_null_subgroup ON zulip.analytics_usercount USING btree (user_id, property, end_time) WHERE ((subgroup)::text IS NULL);
So if the application's schema was inserted with 4.2, then its partial indexes work with the current code. If it was inserted with Django < 4.2, then the indexes don't work with >= 4.2.
This also means that there's a bit of a pickle -- you can't just strip off the ::text
in the query in some later 4.2 bugfix release, because that will also make any 4.2-inserted indexes with ::text
not work with the newly ::text
-less query.
Replying to Simon Charette:
We should try to avoid casts if possible (IIRC they are only required for server-side bindings).
Out of curiosity, if you try re-creating your
UniqueConstraint
on 4.2, are they created with casts? By that I'm in no mean implying that a solution to this problem is to force the re-creation of constraints on each releases but I want to have an idea of the scope of the problem.
If that's the case it means that reverting to no cast in a minor release might break Postgres query planer the other way around as constraint created on 4.2 include
::text
cast but queries on a future version removing them don't.
follow-up: 4 comment:3 by , 14 months ago
Cc: | added |
---|
This also means that there's a bit of a pickle -- you can't just strip off the ::text in the query in some later 4.2 bugfix release, because that will also make any 4.2-inserted indexes with ::text not work with the newly ::text-less query.
Right, that's what I feared. I think we should favour reverting to the previous behaviour if possible though as there are likely to be more constraints created prior to 4.2 in the wild than after even at this point. This is definitely something that should be mentioned in the release notes though.
Any thoughts here Florian and Mariusz?
comment:4 by , 14 months ago
Replying to Simon Charette:
Any thoughts here Florian and Mariusz?
I'd prefer to avoid casting, however, it is necessary when server-side binding is enabled.
follow-up: 6 comment:5 by , 14 months ago
Cc: | added |
---|
What do you think of making the casting conditional to the enablement of the server_side_binding
option in this case?
That would contain the issue to the already existing buckets or problems that come with enabling this option that we're already aware of (e.g. grouping by parameterized annotation).
Happy to prepare a patch if you believe that it's worth pursuing.
comment:6 by , 14 months ago
Replying to Simon Charette:
What do you think of making the casting conditional to the enablement of the
server_side_binding
option in this case?
Yeah, I was thinking the same thing. This is probably the only reasonable option we have.
follow-up: 10 comment:8 by , 14 months ago
Replying to Florian Apolloner:
This doesn't just affect
::text
but all casts?
As far as I'm aware, from the lookup perspective it affects mainly __isnull
.
comment:9 by , 14 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Hey everyone,
I'm excited to tackle this one! The issue (#34840) looks like a great challenge, and I'd love to help resolve it. 😊
I'll start working on a solution and keep you posted on my progress. If you have any insights or suggestions along the way, please feel free to share them.while this is the first time I'm working on something like this so I hope I can do it right as much as possible!
Thanks for the opportunity!
comment:10 by , 14 months ago
Replying to Mariusz Felisiak:
Replying to Florian Apolloner:
This doesn't just affect
::text
but all casts?
As far as I'm aware, from the lookup perspective it affects mainly
__isnull
.
Oh right, and that is mostly a result of handling text columns as unknown
oids instead of text
which has/had it's own share of problems.
Replying to Simon Charette:
What do you think of making the casting conditional to the enablement of the
server_side_binding
option in this case?
Thinking about this more I am not sure this is a good idea; after all people should be free to change that setting and Django should behave the same. I feel like we are running into limitations of our ORM here, do we really need the ::text
cast here for server side bindings? Isn't postgresql able to properly deduce the types? After all, in the query as well as in the index we reference the column subgroup
which has a well defined type (or am I mistaken here). We should only need type casts if we involve bind parameters.
Though that begs the next question: Assume you generate an index for a text column named x
. Would a query with .filter(x__startswith='test')
use the index if the index is created on x
but the query uses x::text like 'test%'
?
EDIT:// The next
question might be a non issue generate x like 'test%'::text
-- Nevertheless it feels a bit unfortunate that postgresql cannot use the index with the cast.
comment:11 by , 14 months ago
Running the tests with server side param binding enabled and having removed the cast for IS NULL
results in three failures:
====================================================================== ERROR: test_model_validation_with_condition (constraints.tests.UniqueConstraintTests.test_model_validation_with_condition) Partial unique constraints are not ignored by ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.11/unittest/case.py", line 57, in testPartExecutor yield File "/usr/lib64/python3.11/unittest/case.py", line 623, in run self._callTestMethod(testMethod) ^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/unittest/case.py", line 579, in _callTestMethod if method() is not None: ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/django/test/testcases.py", line 1425, in skip_wrapper return test_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/tests/constraints/tests.py", line 804, in test_model_validation_with_condition ).validate_constraints() ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/django/db/models/base.py", line 1497, in validate_constraints raise ValidationError(errors) ^^^^^^^^^^^^^^^^^ django.core.exceptions.ValidationError: {'__all__': ['Constraint “name_without_color_uniq” is violated.']} ====================================================================== ERROR: test_validate_condition (constraints.tests.UniqueConstraintTests.test_validate_condition) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.11/unittest/case.py", line 57, in testPartExecutor yield File "/usr/lib64/python3.11/unittest/case.py", line 623, in run self._callTestMethod(testMethod) ^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/unittest/case.py", line 579, in _callTestMethod if method() is not None: ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/django/test/testcases.py", line 1425, in skip_wrapper return test_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/tests/constraints/tests.py", line 879, in test_validate_condition constraint.validate( ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/django/db/models/constraints.py", line 461, in validate raise ValidationError( ^^^^^^^^^^^^^^^^^ django.core.exceptions.ValidationError: ['Constraint “name_without_color_uniq” is violated.'] ====================================================================== ERROR: test_validate_expression_condition (constraints.tests.UniqueConstraintTests.test_validate_expression_condition) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.11/unittest/case.py", line 57, in testPartExecutor yield File "/usr/lib64/python3.11/unittest/case.py", line 623, in run self._callTestMethod(testMethod) ^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/unittest/case.py", line 579, in _callTestMethod if method() is not None: ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/tests/constraints/tests.py", line 965, in test_validate_expression_condition constraint.validate( ^^^^^^^^^^^^^^^^^ File "/home/florian/sources/django.git/django/db/models/constraints.py", line 461, in validate raise ValidationError( ^^^^^^^^^^^^^^^^^ django.core.exceptions.ValidationError: ['Constraint “name_lower_without_color_uniq” is violated.'] ---------------------------------------------------------------------- Ran 16695 tests in 201.401s
comment:12 by , 14 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:13 by , 14 months ago
fix: Correct value format for CharArrayModel in test_isnull_lookup_cast
In the test_isnull_lookup_cast method, the CharArrayModel instance was created with an incorrect value format for the 'field' attribute. This caused a PostgreSQL error when trying to create the instance.
This commit fixes the issue by using a list to represent an array and encloses the value properly in curly braces, ensuring the correct format for the 'field' attribute.i made some test check the status of it.
comment:14 by , 14 months ago
Triage Stage: | Ready for checkin → Accepted |
---|
Please don't see tickets with your own merge requests to "ready for checkin". Additionally, simply removing the cast is not the correct fix (at the very least it is most likely possible to just drop the if etc…)
comment:15 by , 14 months ago
Regarding my comment https://code.djangoproject.com/ticket/34840#comment:11 -- The validation error is wrong there, it is actually a database area that is masked (https://github.com/django/django/pull/15659#pullrequestreview-1629960210). I wonder if we could teach the ORM to only add casts when using bind params, there is no reason to add those for column refs. That is unless our testsuite doesn't miss some further important edge cases :)
At the very least we will need instructions (release notes, maybe a blog post) for people on how to check whether they are affected and how to fix it.
comment:16 by , 14 months ago
Owner: | changed from | to
---|
I wonder if we could extend Simon's solution for #34754 (3434dbd39d373df7193ad006b970c09c1a909ea3). The following patch passes all tests (with and without server side binding parameters):
-
django/db/backends/postgresql/operations.py
diff --git a/django/db/backends/postgresql/operations.py b/django/db/backends/postgresql/operations.py index 93eb982f4b..e536681914 100644
a b class DatabaseOperations(BaseDatabaseOperations): 155 155 def lookup_cast(self, lookup_type, internal_type=None): 156 156 lookup = "%s" 157 157 158 if lookup_type == "isnull" and internal_type in (159 "CharField",160 "EmailField",161 "TextField",162 ):163 return "%s::text"164 165 158 # Cast text lookups to text to allow things like filter(x__contains=4) 166 159 if lookup_type in ( 167 160 "iexact", -
django/db/models/lookups.py
diff --git a/django/db/models/lookups.py b/django/db/models/lookups.py index 896d3eee44..eb887f5548 100644
a b class IsNull(BuiltinLookup): 624 624 raise ValueError( 625 625 "The QuerySet value for an isnull lookup must be True or False." 626 626 ) 627 if isinstance(self.lhs, Value) and self.lhs.value is None:627 if isinstance(self.lhs, Value): 628 628 if self.rhs: 629 raise FullResultSet 629 if self.lhs.value is None: 630 raise FullResultSet 631 else: 632 raise EmptyResultSet 630 633 else: 631 raise EmptyResultSet 634 if self.lhs.value is None: 635 raise EmptyResultSet 636 else: 637 raise FullResultSet 632 638 sql, params = self.process_lhs(compiler, connection) 633 639 if self.rhs: 634 640 return "%s IS NULL" % sql, params -
django/db/models/query_utils.py
diff --git a/django/db/models/query_utils.py b/django/db/models/query_utils.py index 4f3358eb8d..830a7fdb6f 100644
a b import inspect 10 10 import logging 11 11 from collections import namedtuple 12 12 13 from django.core.exceptions import FieldError13 from django.core.exceptions import EmptyResultSet, FieldError 14 14 from django.db import DEFAULT_DB_ALIAS, DatabaseError, connections 15 15 from django.db.models.constants import LOOKUP_SEP 16 16 from django.utils import tree … … class Q(tree.Node): 140 140 except DatabaseError as e: 141 141 logger.warning("Got a database error calling check() on %r: %s", self, e) 142 142 return True 143 except EmptyResultSet: 144 return False 143 145 144 146 def deconstruct(self): 145 147 path = "%s.%s" % (self.__class__.__module__, self.__class__.__name__)
comment:18 by , 14 months ago
That is some creative thinking Mariusz. So just so I get this right: Django "optimizes" away part of the asked query if it can determine the outcome of the clause/lookup/etc? If that is the case: wow, never realized that :)
follow-up: 20 comment:19 by , 14 months ago
Florian, yeah that's the gist of it.
This approach has existed for a while in the ORM to avoid performing a query for filter(somefield__in=[])
as SQL doesn't allow "column" IN ()
anyway and we've leaned on it recently to prevent some full table scan on Postgres (#27397) and issues with NULL
handling when dealing with JSONField
(#34754).
I think it makes sense to use this approach in cases where Python input poses type ambiguity at the database level like it did with the IN
operator, after all we were forced to add ::text
cast because Postgres doesn't know what the type of %s IS NULL
should be.
In all cases the adjustments to Q.check
to handle EmptyResultSet
should be added as other lookups could generate it, the same can be said with FullResultSet
.
I think we should not let the discussion about the silencing of DatabaseError
die off though and consider a deprecation path to raise the awareness of the issue. Maybe a possible alternative solution that doesn't warrant expression introspection (e.g. special casing RawSQL
) could be a Constraint
kwarg that allows disabling model level validation entirely? That could be a way to silence the warning during the deprecation period.
comment:20 by , 14 months ago
Replying to Simon Charette:
This approach has existed for a while in the ORM to avoid performing a query for
filter(somefield__in=[])
as SQL doesn't allow"column" IN ()
anyway and we've leaned on it recently to prevent some full table scan on Postgres (#27397) and issues withNULL
handling when dealing withJSONField
(#34754).
I think it makes sense to use this approach in cases where Python input poses type ambiguity at the database level like it did with the
IN
operator, after all we were forced to add::text
cast because Postgres doesn't know what the type of%s IS NULL
should be.
Makes sense. And as we see we only need the ::text
cast in very special situations, a .filter(field__isnull=True)
won't require it…
In all cases the adjustments to
Q.check
to handleEmptyResultSet
should be added as other lookups could generate it, the same can be said aboutFullResultSet
.
ACK, and we need tests for both cases. I see that the WhereNode
properly propagates EmptyResultSet
/FullResultSet
-- I wonder if there are other cases where we don't do that yet (but I guess we will need to find them on a case by case basis).
I think we should not let the discussion about the silencing of
DatabaseError
die off though and consider a deprecation path to raise the awareness of the issue. Maybe a possible alternative solution that doesn't warrant expression introspection (e.g. special casingRawSQL
) could be aConstraint
kwarg that allows disabling model level validation entirely? That could be a way to silence the warning during the deprecation period.
+1. Though that should probably be a separate PR/ticket?
As for the indexing issues for people which already created indexes on 4.2 -- we should probably (aside from the release notes) also mention it explicitly in a blog post? This is something that can be very hard to find.
comment:21 by , 14 months ago
Replying to Mariusz Felisiak:
Alex, does it work for you?
My parse of it is that it'll drop the ::text
cast in the query, which means that the partial index, as created pre-4.2, work as expected -- so yup!
(For the meantime, we've manually recreated the indexes with the cast in them in production, so we get their utility -- but we'll happily swap those back to the cast-less variety when we upgrade to a Django with this fix)
We should try to avoid casts if possible (IIRC they are only required for server-side bindings).
Out of curiosity, if you try re-creating your
UniqueConstraint
on 4.2, are they created with casts? By that I'm in no mean implying that a solution to this problem is to force the re-creation of constraints on each releases but I want to have an idea of the scope of the problem.If that's the case it means that reverting to no cast in a minor release might break Postgres query planer the other way around as constraint created on 4.2 include
::text
cast but queries on a future version removing them don't.