#35539 closed Bug (invalid)
SearchVector GinIndex raises IMMUTABLE error
Reported by: | Alastair D'Silva | Owned by: | |
---|---|---|---|
Component: | contrib.postgres | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Mariusz Felisiak, Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have a Document model with the following subclass:
class Meta: indexes = [ GinIndex(SearchVector('search_vector'), name='Document search vector'), # Create a GIN index on the search vector ]
It creates the following migration:
migrations.AddIndex( model_name='document', index=django.contrib.postgres.indexes.GinIndex(django.contrib.postgres.search.SearchVector('search_vector'), name='Document search vector'), ),
which in turn attempts to create the following index:
CREATE INDEX "Document search vector" ON "documents_document" USING gin ((to_tsvector(COALESCE(("search_vector")::text, ''))))
This fails with:
ERROR: functions in index expression must be marked IMMUTABLE
If I update contrib/postgres/search.py as follows:
91:
function = ""
and 116-128:
# clone.set_source_expressions( # [ # Coalesce( # ( # expression # if isinstance(expression.output_field, (CharField, TextField)) # else Cast(expression, TextField()) # ), # Value(""), # ) # for expression in clone.get_source_expressions() # ] # )
I can successfully create the index, and searches using SearchRank successfully use it.
Change History (5)
comment:1 by , 6 months ago
Cc: | added |
---|---|
Summary: | contrib/postgres/search.py COALESCE breaks GIN Index creation on PostgreSQL 15 → SearchVector GinIndex raises IMMUTABLE error |
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 6 months ago
Has patch: | unset |
---|
comment:3 by , 6 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Blaming the surrounding code leads to #30488 and #30385 which have plenty of context. It also has ties to #34955.
The summary is that SearchVector
was designed to support any field and expression thrown at it even the ones that are not of textual type and / or are nullable. This encouraged the usage of Postgres CONCAT
function over the ||
operator as the former properly deals with both
psql (16.3, server 15.6 (Debian 15.6-1.pgdg120+2)) Type "help" for help. django=# SELECT CONCAT('foo', NULL, 1); concat -------- foo1 (1 row) django=# SELECT 'foo' || NULL || 1; ?column? ---------- (1 row)
The problem with CONCAT
however is that it produces non-IMMUTABLE
output (the resulting data can change based on some connection settings that relate to collation for example) and thus is not suitable in index creation which creates a pickle about how NULL
-able values should be dealt with.
With all that said this ticket has little to do with this complexity. If you comment out the code reported to solve the issue by the reporter your test still fails Sarah and the reason is simple: to_tsvector
is not IMMUTABLE
unless a regconfig
is specified which is covered at length in the PostgreSQL documentation and included in the Django docs as well in the example.
Notice that the 2-argument version of
to_tsvector
is used. Only text search functions that specify a configuration name can be used in expression indexes (Section 11.7). This is because the index contents must be unaffected bydefault_text_search_config
. If they were affected, the index contents might be inconsistent because different entries could containtsvectors
that were created with different text search configurations, and there would be no way to guess which was which. It would be impossible to dump and restore such an index correctly.
So for the same reason Concat
cannot be used in indices because it relies on alterable global configuration SearchVector
cannot be used in indices without specifying a config
because they do not use the same stopwords and stoplists
django=# SELECT to_tsvector('english', 'The Web framework for perfectionists with deadlines.'); to_tsvector ----------------------------------------------------- 'deadlin':7 'framework':3 'perfectionist':5 'web':2 (1 row) django=# SELECT to_tsvector('french', 'The Web framework for perfectionists with deadlines.'); to_tsvector ------------------------------------------------------------------------------ 'deadlin':7 'for':4 'framework':3 'perfectionist':5 'the':1 'web':2 'with':6 (1 row)
I'm not sure how the reporter got that removing these particular lines of code would help but in its current form the report appears to me to be invalid.
comment:4 by , 6 months ago
Having the NULL propagate doesn't seem to big deal when searching, it still means that items that don't have a populated search vector won't be found.
I have the following patch which is working for me, but does need a bit more effort, in particular, the bit which removes to_tsvector if we already have a ts_vector (you can't call to_tsvector on a ts_vector).
--- search.py.orig 2024-07-09 15:10:14.092901831 +1000 +++ search.py 2024-07-09 09:46:15.962716177 +1000 @@ -4,6 +4,7 @@ Field, FloatField, Func, + JSONField, Lookup, TextField, Value, @@ -113,19 +114,26 @@ def as_sql(self, compiler, connection, function=None, template=None): clone = self.copy() - clone.set_source_expressions( - [ - Coalesce( + + new_expressions = [] + for expression in clone.get_source_expressions(): + if isinstance(expression.output_field, SearchVectorField): + new_expressions.append(expression) + function = '' + elif isinstance(expression.output_field, JSONField): + new_expressions.append(Coalesce(expression, Value("{}"))), + else: + new_expressions.append(Coalesce( ( expression if isinstance(expression.output_field, (CharField, TextField)) else Cast(expression, TextField()) ), Value(""), - ) - for expression in clone.get_source_expressions() - ] - ) + )) + + clone.set_source_expressions(new_expressions) + config_sql = None config_params = [] if template is None:
Here is my usage:
class Document(models.Model): title = models.CharField(max_length=255) metadata = models.JSONField(null=True) uploaded_at = models.DateTimeField(auto_now_add=True) tags = models.ManyToManyField(Tag, related_name='documents') type_fields = models.JSONField(default=dict) search_vector = SearchVectorField(null=True) def save(self, *args, **kwargs): super().save(*args, **kwargs) # Save the instance first self.update_search_vector() def update_search_vector(self): self.search_vector = SearchVector('title', 'metadata') Document.objects.filter(pk=self.pk).update(search_vector=self.search_vector) class Meta: indexes = [ GinIndex(SearchVector('search_vector'), name='Document search'), ]
comment:5 by , 6 months ago
You don't need this patch at all of the save()
shenanigans. All you are doing here is circumventing the fact that must specify a language configuration for ts_vector
to be immutable by running the update(search_vector=self.search_vector)
that will use the session configured default_text_search_config
.
You can simply do
class Document(models.Model): title = models.CharField(max_length=255) metadata = models.JSONField(null=True) class Meta: indexes = [ GinIndex( SearchVector('title', Coalesce('metadata', Value('')), config='english'), name='document_search', ), ]
Or use a GeneratedField
if you want to materialize the search vector
class Document(models.Model): title = models.CharField(max_length=255) metadata = models.JSONField(null=True) search_vector = GeneratedField( SearchVector('title', Coalesce('metadata', Value('')), config='english') output_field=SearchVectorField() ) class Meta: indexes = [ GinIndex( 'search_vector', name='search_vector_idx' ), ]
The key part here is that a config
must be specified if you want to use SearchVector
in an index or generated field to make it IMMUTABLE
Thank you for the report, was able to replicate the error 👍
I believe this is a valid bug, I've never used these before myself but looks like it's being used as documented
Here is my "rough" test if it gives anyone a starting point
tests/postgres_tests/test_indexes.py