Opened 7 years ago
Closed 2 years ago
#29084 closed Cleanup/optimization (fixed)
Skip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different
Reported by: | Дилян Палаузов | Owned by: | Pablo Nicolas Estevez |
---|---|---|---|
Component: | contrib.postgres | Version: | 1.11 |
Severity: | Normal | Keywords: | tests postgresql search |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
tests/postgres_tests/test_search.py:class SimpleSearchTest contains:
def test_non_exact_match(self): searched = Line.objects.filter(dialogue__search='hearts') self.assertSequenceEqual(searched, [self.verse2]) def test_search_two_terms(self): searched = Line.objects.filter(dialogue__search='heart bowel') self.assertSequenceEqual(searched, [self.verse2])
The first test calls:
SELECT to_tsvector('His head smashed in and his heart cut out, ') @@ plainto_tsquery('hearts')
which is false. In particular:
SELECT to_tsvector('His head smashed in and his heart cut out, ');
returns 'and':5 'cut':8 'head':2 'heart':7 'his':1,6 'in':4 'out':9 'smashed':3
and SELECT plainto_tsquery('hearts');
returns 'hearts'
.
However this works:
SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') @@ plainto_tsquery('english', 'hearts');
as SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ')
returns 'cut':8 'head':2 'heart':7 'smash':3
and SELECT plainto_tsquery('english', 'hearts');
returns heart
.
The second test calls:
SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ plainto_tsquery('heart bowel');
which is again false. In particular SELECT COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--');
returns His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--
, SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--'));
returns and':5,10,14,18,22,27 'bottom':24 'bowels':16 'burned':25 'cut':8 'head':2 'heart':7 'his':1,6,11,15,19,23,28 'in':4 'liver':12 'nostrils':20 'off':26 'out':9 'removed':13 'ripped':21 'smashed':3 'unplugged':17
and SELECT plainto_tsquery('heart bowel');
returns 'heart' & 'bowel'
.
Here again 'english' helps:
SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--'));
returns 'bottom':24 'bowel':16 'burn':25 'cut':8 'head':2 'heart':7 'liver':12 'nostril':20 'remov':13 'rip':21 'smash':3 'unplug':17
and
SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ (plainto_tsquery('heart bowel'));
is true.
Change History (12)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
That is exactly what I am saying. These are my search configurations:
psql \dF List of text search configurations Schema | Name | Description ------------+------------+--------------------------------------- pg_catalog | danish | configuration for danish language pg_catalog | dutch | configuration for dutch language pg_catalog | english | configuration for english language pg_catalog | finnish | configuration for finnish language pg_catalog | french | configuration for french language pg_catalog | german | configuration for german language pg_catalog | hungarian | configuration for hungarian language pg_catalog | italian | configuration for italian language pg_catalog | norwegian | configuration for norwegian language pg_catalog | portuguese | configuration for portuguese language pg_catalog | romanian | configuration for romanian language pg_catalog | russian | configuration for russian language pg_catalog | simple | simple configuration pg_catalog | spanish | configuration for spanish language pg_catalog | swedish | configuration for swedish language pg_catalog | turkish | configuration for turkish language (16 rows) psql SHOW default_text_search_config ; default_text_search_config ---------------------------- pg_catalog.simple (1 row) psql \dF+ simple Text search configuration "pg_catalog.simple" Parser: "pg_catalog.default" Token | Dictionaries -----------------+-------------- asciihword | simple asciiword | simple email | simple file | simple float | simple host | simple hword | simple hword_asciipart | simple hword_numpart | simple hword_part | simple int | simple numhword | simple numword | simple sfloat | simple uint | simple url | simple url_path | simple version | simple word | simple
The simple dictionary
is described at https://www.postgresql.org/docs/9.6/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY . The tests assume that in the default configuration the english_stem
dictionary is used. However simple
is the default configuration for unconfigured PG: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html#GUC-DEFAULT-TEXT-SEARCH-CONFIG .
comment:3 by , 7 years ago
Summary: | tests.postgres_tests.test_search.SimpleSearchTest: to_tsvector/plainto_tsquery need 'english' as first parameter → Skip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Skipping the tests is an option. Feel free to propose something else.
comment:4 by , 7 years ago
I propose passing explicitly 'english' as first parameter to both of to_tsvector/plainto_tsquery
:
diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py index b93077f..b721f1f 100644 --- a/tests/postgres_tests/test_search.py +++ b/tests/postgres_tests/test_search.py @@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase): self.assertSequenceEqual(searched, [self.verse1]) def test_non_exact_match(self): - searched = Line.objects.filter(dialogue__search='hearts') + searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('hearts', config='english')) self.assertSequenceEqual(searched, [self.verse2]) def test_search_two_terms(self): - searched = Line.objects.filter(dialogue__search='heart bowel') + searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('heart bowel', config='english')) self.assertSequenceEqual(searched, [self.verse2]) def test_search_two_terms_with_partial_match(self):
comment:6 by , 7 years ago
I don't know. Take the words as they are: "heart" and "bowels":
diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py --- a/tests/postgres_tests/test_search.py +++ b/tests/postgres_tests/test_search.py @@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase): self.assertSequenceEqual(searched, [self.verse1]) def test_non_exact_match(self): - searched = Line.objects.filter(dialogue__search='hearts') + searched = Line.objects.filter(dialogue__search='heart') self.assertSequenceEqual(searched, [self.verse2]) def test_search_two_terms(self): - searched = Line.objects.filter(dialogue__search='heart bowel') + searched = Line.objects.filter(dialogue__search='heart bowels') self.assertSequenceEqual(searched, [self.verse2])
comment:7 by , 2 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
Hi, i will try to solve the problem.
comment:8 by , 2 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
The config in postgresql.config require:
default_text_search_config = 'pg_catalog.english'
In case the config has another languaje i writed a pull request to skip the test
https://github.com/django/django/pull/16357
comment:9 by , 2 years ago
Has patch: | set |
---|---|
Resolution: | fixed |
Status: | closed → new |
The ticket isn't closed until the fix is committed.
comment:10 by , 2 years ago
Patch needs improvement: | set |
---|---|
Status: | new → assigned |
comment:11 by , 2 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Are you saying that the tests don't pass on your system? Is the difference based on the system's language or something? Maybe a skip condition can be added for those tests.