Opened 7 months ago

Last modified 7 months ago

#29084 new Cleanup/optimization

Skip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different

Reported by: Дилян Палаузов Owned by:
Component: contrib.postgres Version: 1.11
Severity: Normal Keywords: tests postgresql search
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

tests/postgres_tests/test_search.py:class SimpleSearchTest contains:

def test_non_exact_match(self):
    searched = Line.objects.filter(dialogue__search='hearts')
    self.assertSequenceEqual(searched, [self.verse2])

def test_search_two_terms(self):
    searched = Line.objects.filter(dialogue__search='heart bowel')
    self.assertSequenceEqual(searched, [self.verse2])
 

The first test calls:

SELECT to_tsvector('His head smashed in and his heart cut out, ') @@ plainto_tsquery('hearts')

which is false. In particular:
SELECT to_tsvector('His head smashed in and his heart cut out, '); returns 'and':5 'cut':8 'head':2 'heart':7 'his':1,6 'in':4 'out':9 'smashed':3 and SELECT plainto_tsquery('hearts'); returns 'hearts'.

However this works:

SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') @@ plainto_tsquery('english', 'hearts');

as SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') returns 'cut':8 'head':2 'heart':7 'smash':3 and SELECT plainto_tsquery('english', 'hearts'); returns heart.

The second test calls:

SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ plainto_tsquery('heart bowel');

which is again false. In particular SELECT COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--'); returns His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--, SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')); returns and':5,10,14,18,22,27 'bottom':24 'bowels':16 'burned':25 'cut':8 'head':2 'heart':7 'his':1,6,11,15,19,23,28 'in':4 'liver':12 'nostrils':20 'off':26 'out':9 'removed':13 'ripped':21 'smashed':3 'unplugged':17 and SELECT plainto_tsquery('heart bowel'); returns 'heart' & 'bowel'.

Here again 'english' helps:
SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')); returns 'bottom':24 'bowel':16 'burn':25 'cut':8 'head':2 'heart':7 'liver':12 'nostril':20 'remov':13 'rip':21 'smash':3 'unplug':17 and

SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ (plainto_tsquery('heart bowel'));

is true.

Change History (6)

comment:1 Changed 7 months ago by Tim Graham

Are you saying that the tests don't pass on your system? Is the difference based on the system's language or something? Maybe a skip condition can be added for those tests.

comment:2 Changed 7 months ago by Дилян Палаузов

That is exactly what I am saying. These are my search configurations:

psql \dF
               List of text search configurations
   Schema   |    Name    |              Description              
------------+------------+---------------------------------------
 pg_catalog | danish     | configuration for danish language
 pg_catalog | dutch      | configuration for dutch language
 pg_catalog | english    | configuration for english language
 pg_catalog | finnish    | configuration for finnish language
 pg_catalog | french     | configuration for french language
 pg_catalog | german     | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian    | configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian    | configuration for russian language
 pg_catalog | simple     | simple configuration
 pg_catalog | spanish    | configuration for spanish language
 pg_catalog | swedish    | configuration for swedish language
 pg_catalog | turkish    | configuration for turkish language
(16 rows)

psql SHOW default_text_search_config ;
 default_text_search_config 
----------------------------
 pg_catalog.simple
(1 row)

psql \dF+ simple
Text search configuration "pg_catalog.simple"
Parser: "pg_catalog.default"
      Token      | Dictionaries 
-----------------+--------------
 asciihword      | simple
 asciiword       | simple
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | simple
 hword_asciipart | simple
 hword_numpart   | simple
 hword_part      | simple
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | simple

The simple dictionary is described at https://www.postgresql.org/docs/9.6/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY . The tests assume that in the default configuration the english_stem dictionary is used. However simple is the default configuration for unconfigured PG: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html#GUC-DEFAULT-TEXT-SEARCH-CONFIG .

comment:3 Changed 7 months ago by Tim Graham

Summary: tests.postgres_tests.test_search.SimpleSearchTest: to_tsvector/plainto_tsquery need 'english' as first parameterSkip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Skipping the tests is an option. Feel free to propose something else.

comment:4 Changed 7 months ago by Дилян Палаузов

I propose passing explicitly 'english' as first parameter to both of to_tsvector/plainto_tsquery:

diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
index b93077f..b721f1f 100644
--- a/tests/postgres_tests/test_search.py
+++ b/tests/postgres_tests/test_search.py
@@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase):
         self.assertSequenceEqual(searched, [self.verse1])
 
     def test_non_exact_match(self):
-        searched = Line.objects.filter(dialogue__search='hearts')
+        searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('hearts', config='english'))
         self.assertSequenceEqual(searched, [self.verse2])
 
     def test_search_two_terms(self):
-        searched = Line.objects.filter(dialogue__search='heart bowel')
+        searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('heart bowel', config='english'))
         self.assertSequenceEqual(searched, [self.verse2])
 
     def test_search_two_terms_with_partial_match(self):

comment:5 Changed 7 months ago by Tim Graham

The test needs to test __search.

comment:6 Changed 7 months ago by Дилян Палаузов

I don't know. Take the words as they are: "heart" and "bowels":

diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
--- a/tests/postgres_tests/test_search.py
+++ b/tests/postgres_tests/test_search.py
@@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase):
         self.assertSequenceEqual(searched, [self.verse1])
 
     def test_non_exact_match(self):
-        searched = Line.objects.filter(dialogue__search='hearts')
+        searched = Line.objects.filter(dialogue__search='heart')
         self.assertSequenceEqual(searched, [self.verse2])
 
     def test_search_two_terms(self):
-        searched = Line.objects.filter(dialogue__search='heart bowel')
+        searched = Line.objects.filter(dialogue__search='heart bowels')
         self.assertSequenceEqual(searched, [self.verse2])
Note: See TracTickets for help on using tickets.
Back to Top