Opened 7 years ago
Closed 7 years ago
#28128 closed Uncategorized (duplicate)
Fulltext search very slow with annotate(SearchVector)
Reported by: | Gavin Wahl | Owned by: | |
---|---|---|---|
Component: | contrib.postgres | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have this model:
class Bug(models.Model): # .. fields ... # This field is managed by Django, adding stuff to it that should be # included in the fulltext search. fulltext = models.TextField()
The Bug
table has many rows, so I need to add an index on the fulltext column for fulltext search:
CREATE INDEX ON buggy_bug USING GIN (to_tsvector('english'::regconfig, COALESCE("fulltext", '')));
Note that postgres requires the english
config argument to to_tsvector
for it be allowed to be used in an index.
Now, at query time, the expression Django uses must exactly match the expression in the functional index. By default, filter(fulltext__search='foo')
leaves out the config argument to to_tsvector
, so my index isn't used. The documentation shows us how to fix this, by annotating a search field:
Bug.objects.annotate( search=SearchVector('fulltext', config='english') ).filter( search='foo' )
Now the expressions match and the index is used. However, the query is still crazy slow. This is the query Django does:
SELECT "buggy_bug"."id", to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext", '')) AS "search" FROM "buggy_bug" WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext", '')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
Which takes 2500 milliseconds on my data, and EXPLAIN confirms it is using the index. With some playing around with it, I find that removing the annotated search column gives a huge speedup:
SELECT "buggy_bug"."id" FROM "buggy_bug" WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext", '')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;
This query only takes 11 milliseconds.
What's the point of selecting the result of to_tsvector? We don't actually use it in Django, we just had to do it to specify the config to use for to_tsvector.
There needs to be a way to specify the config for to_tsvector in order to use an index without a huge slowdown.
I think this is a duplicate of #27719.