Opened 7 years ago

Closed 7 years ago

#28128 closed Uncategorized (duplicate)

Fulltext search very slow with annotate(SearchVector)

Reported by: Gavin Wahl Owned by:
Component: contrib.postgres Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have this model:

class Bug(models.Model):
    # .. fields ...

    # This field is managed by Django, adding stuff to it that should be
    # included in the fulltext search.
    fulltext = models.TextField()

The Bug table has many rows, so I need to add an index on the fulltext column for fulltext search:

CREATE INDEX ON buggy_bug USING GIN (to_tsvector('english'::regconfig, COALESCE("fulltext", '')));

Note that postgres requires the english config argument to to_tsvector for it be allowed to be used in an index.

Now, at query time, the expression Django uses must exactly match the expression in the functional index. By default, filter(fulltext__search='foo') leaves out the config argument to to_tsvector, so my index isn't used. The documentation shows us how to fix this, by annotating a search field:

Bug.objects.annotate(
    search=SearchVector('fulltext', config='english')
).filter(
    search='foo'
)

Now the expressions match and the index is used. However, the query is still crazy slow. This is the query Django does:

SELECT "buggy_bug"."id",
       to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext", '')) AS "search"
FROM "buggy_bug"
WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext", '')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;

Which takes 2500 milliseconds on my data, and EXPLAIN confirms it is using the index. With some playing around with it, I find that removing the annotated search column gives a huge speedup:

SELECT "buggy_bug"."id"
FROM "buggy_bug"
WHERE to_tsvector('english'::regconfig, COALESCE("buggy_bug"."fulltext", '')) @@ (plainto_tsquery('english'::regconfig, 'foo')) = true;

This query only takes 11 milliseconds.

What's the point of selecting the result of to_tsvector? We don't actually use it in Django, we just had to do it to specify the config to use for to_tsvector.

There needs to be a way to specify the config for to_tsvector in order to use an index without a huge slowdown.

Change History (1)

comment:1 by Simon Charette, 7 years ago

Resolution: duplicate
Status: newclosed

I think this is a duplicate of #27719.

Note: See TracTickets for help on using tickets.
Back to Top