Opened 5 years ago
Last modified 4 years ago
#32492 closed New feature
Add django API for Postgres pg_trgm word_similarity, fuzzy full-text search. — at Initial Version
| Reported by: | Taneli | Owned by: | nobody |
|---|---|---|---|
| Component: | contrib.postgres | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Paolo Melchiorre | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Use case: filter queryset by an inexact substring. For example, retrieve an object with the following headline ' Dogecoin is following bitcoin in its dramatic rise' when searching for 'dogge'.
Currently it is possible to filter a queryset on the basis of trigram similarity between the search string and full text stored in a column.
Author.objects.annotate(similarity=TrigramSimilarity('name', test),).filter(similarity__gt=0.3)
This is a wrapper around the similarity function of the pg_trgm extension. While it allows comparing full strings, i.e. searching for 'doge' would find 'dogs' or 'dogge' it is useless for fuzzy searching of substrings.
SELECT similarity('dogge', 'doge');
---------
0.57
SELECT similarity('dogge', 'dogecoin is following bitcoin');
------------
0.1
word_similarity does take into account the word boundaries
SELECT word_similarity('doge', 'dogecoin is following bitcoin');
--------------
0.5
Adding a django API to word_similarity would allow for better fuzzy fulltext search without a need to use either raw SQL or external tools like elasticsearch.