Opened 8 years ago
Closed 3 years ago
#29084 closed Cleanup/optimization (fixed)
Skip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different
| Reported by: | Дилян Палаузов | Owned by: | Pablo Nicolas Estevez |
|---|---|---|---|
| Component: | contrib.postgres | Version: | 1.11 |
| Severity: | Normal | Keywords: | tests postgresql search |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
tests/postgres_tests/test_search.py:class SimpleSearchTest contains:
def test_non_exact_match(self):
searched = Line.objects.filter(dialogue__search='hearts')
self.assertSequenceEqual(searched, [self.verse2])
def test_search_two_terms(self):
searched = Line.objects.filter(dialogue__search='heart bowel')
self.assertSequenceEqual(searched, [self.verse2])
The first test calls:
SELECT to_tsvector('His head smashed in and his heart cut out, ') @@ plainto_tsquery('hearts')
which is false. In particular:
SELECT to_tsvector('His head smashed in and his heart cut out, '); returns 'and':5 'cut':8 'head':2 'heart':7 'his':1,6 'in':4 'out':9 'smashed':3 and SELECT plainto_tsquery('hearts'); returns 'hearts'.
However this works:
SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') @@ plainto_tsquery('english', 'hearts');
as SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') returns 'cut':8 'head':2 'heart':7 'smash':3 and SELECT plainto_tsquery('english', 'hearts'); returns heart.
The second test calls:
SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ plainto_tsquery('heart bowel');
which is again false. In particular SELECT COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--'); returns His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--, SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')); returns and':5,10,14,18,22,27 'bottom':24 'bowels':16 'burned':25 'cut':8 'head':2 'heart':7 'his':1,6,11,15,19,23,28 'in':4 'liver':12 'nostrils':20 'off':26 'out':9 'removed':13 'ripped':21 'smashed':3 'unplugged':17 and SELECT plainto_tsquery('heart bowel'); returns 'heart' & 'bowel'.
Here again 'english' helps:
SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')); returns 'bottom':24 'bowel':16 'burn':25 'cut':8 'head':2 'heart':7 'liver':12 'nostril':20 'remov':13 'rip':21 'smash':3 'unplug':17 and
SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ (plainto_tsquery('heart bowel'));
is true.
Change History (12)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
That is exactly what I am saying. These are my search configurations:
psql \dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | configuration for danish language
pg_catalog | dutch | configuration for dutch language
pg_catalog | english | configuration for english language
pg_catalog | finnish | configuration for finnish language
pg_catalog | french | configuration for french language
pg_catalog | german | configuration for german language
pg_catalog | hungarian | configuration for hungarian language
pg_catalog | italian | configuration for italian language
pg_catalog | norwegian | configuration for norwegian language
pg_catalog | portuguese | configuration for portuguese language
pg_catalog | romanian | configuration for romanian language
pg_catalog | russian | configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | configuration for spanish language
pg_catalog | swedish | configuration for swedish language
pg_catalog | turkish | configuration for turkish language
(16 rows)
psql SHOW default_text_search_config ;
default_text_search_config
----------------------------
pg_catalog.simple
(1 row)
psql \dF+ simple
Text search configuration "pg_catalog.simple"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+--------------
asciihword | simple
asciiword | simple
email | simple
file | simple
float | simple
host | simple
hword | simple
hword_asciipart | simple
hword_numpart | simple
hword_part | simple
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | simple
The simple dictionary is described at https://www.postgresql.org/docs/9.6/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY . The tests assume that in the default configuration the english_stem dictionary is used. However simple is the default configuration for unconfigured PG: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html#GUC-DEFAULT-TEXT-SEARCH-CONFIG .
comment:3 by , 8 years ago
| Summary: | tests.postgres_tests.test_search.SimpleSearchTest: to_tsvector/plainto_tsquery need 'english' as first parameter → Skip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
| Type: | Uncategorized → Cleanup/optimization |
Skipping the tests is an option. Feel free to propose something else.
comment:4 by , 8 years ago
I propose passing explicitly 'english' as first parameter to both of to_tsvector/plainto_tsquery:
diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
index b93077f..b721f1f 100644
--- a/tests/postgres_tests/test_search.py
+++ b/tests/postgres_tests/test_search.py
@@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase):
self.assertSequenceEqual(searched, [self.verse1])
def test_non_exact_match(self):
- searched = Line.objects.filter(dialogue__search='hearts')
+ searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('hearts', config='english'))
self.assertSequenceEqual(searched, [self.verse2])
def test_search_two_terms(self):
- searched = Line.objects.filter(dialogue__search='heart bowel')
+ searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('heart bowel', config='english'))
self.assertSequenceEqual(searched, [self.verse2])
def test_search_two_terms_with_partial_match(self):
comment:6 by , 8 years ago
I don't know. Take the words as they are: "heart" and "bowels":
diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
--- a/tests/postgres_tests/test_search.py
+++ b/tests/postgres_tests/test_search.py
@@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase):
self.assertSequenceEqual(searched, [self.verse1])
def test_non_exact_match(self):
- searched = Line.objects.filter(dialogue__search='hearts')
+ searched = Line.objects.filter(dialogue__search='heart')
self.assertSequenceEqual(searched, [self.verse2])
def test_search_two_terms(self):
- searched = Line.objects.filter(dialogue__search='heart bowel')
+ searched = Line.objects.filter(dialogue__search='heart bowels')
self.assertSequenceEqual(searched, [self.verse2])
comment:7 by , 3 years ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
Hi, i will try to solve the problem.
comment:8 by , 3 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
The config in postgresql.config require:
default_text_search_config = 'pg_catalog.english'
In case the config has another languaje i writed a pull request to skip the test
https://github.com/django/django/pull/16357
comment:9 by , 3 years ago
| Has patch: | set |
|---|---|
| Resolution: | fixed |
| Status: | closed → new |
The ticket isn't closed until the fix is committed.
comment:10 by , 3 years ago
| Patch needs improvement: | set |
|---|---|
| Status: | new → assigned |
comment:11 by , 3 years ago
| Patch needs improvement: | unset |
|---|---|
| Triage Stage: | Accepted → Ready for checkin |
Are you saying that the tests don't pass on your system? Is the difference based on the system's language or something? Maybe a skip condition can be added for those tests.