Opened 6 years ago

Closed 17 months ago

#29084 closed Cleanup/optimization (fixed)

Skip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different

Reported by: Дилян Палаузов Owned by: Pablo Nicolas Estevez
Component: contrib.postgres Version: 1.11
Severity: Normal Keywords: tests postgresql search
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

tests/postgres_tests/test_search.py:class SimpleSearchTest contains:

def test_non_exact_match(self):
    searched = Line.objects.filter(dialogue__search='hearts')
    self.assertSequenceEqual(searched, [self.verse2])

def test_search_two_terms(self):
    searched = Line.objects.filter(dialogue__search='heart bowel')
    self.assertSequenceEqual(searched, [self.verse2])
 

The first test calls:

SELECT to_tsvector('His head smashed in and his heart cut out, ') @@ plainto_tsquery('hearts')

which is false. In particular:
SELECT to_tsvector('His head smashed in and his heart cut out, '); returns 'and':5 'cut':8 'head':2 'heart':7 'his':1,6 'in':4 'out':9 'smashed':3 and SELECT plainto_tsquery('hearts'); returns 'hearts'.

However this works:

SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') @@ plainto_tsquery('english', 'hearts');

as SELECT to_tsvector('english', 'His head smashed in and his heart cut out, ') returns 'cut':8 'head':2 'heart':7 'smash':3 and SELECT plainto_tsquery('english', 'hearts'); returns heart.

The second test calls:

SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ plainto_tsquery('heart bowel');

which is again false. In particular SELECT COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--'); returns His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--, SELECT to_tsvector(COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')); returns and':5,10,14,18,22,27 'bottom':24 'bowels':16 'burned':25 'cut':8 'head':2 'heart':7 'his':1,6,11,15,19,23,28 'in':4 'liver':12 'nostrils':20 'off':26 'out':9 'removed':13 'ripped':21 'smashed':3 'unplugged':17 and SELECT plainto_tsquery('heart bowel'); returns 'heart' & 'bowel'.

Here again 'english' helps:
SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')); returns 'bottom':24 'bowel':16 'burn':25 'cut':8 'head':2 'heart':7 'liver':12 'nostril':20 'remov':13 'rip':21 'smash':3 'unplug':17 and

SELECT to_tsvector('english', COALESCE('His head smashed in and his heart cut out, And his liver removed and his bowels unplugged, And his nostrils ripped and his bottom burned off, And his--')) @@ (plainto_tsquery('heart bowel'));

is true.

Change History (12)

comment:1 by Tim Graham, 6 years ago

Are you saying that the tests don't pass on your system? Is the difference based on the system's language or something? Maybe a skip condition can be added for those tests.

comment:2 by Дилян Палаузов, 6 years ago

That is exactly what I am saying. These are my search configurations:

psql \dF
               List of text search configurations
   Schema   |    Name    |              Description              
------------+------------+---------------------------------------
 pg_catalog | danish     | configuration for danish language
 pg_catalog | dutch      | configuration for dutch language
 pg_catalog | english    | configuration for english language
 pg_catalog | finnish    | configuration for finnish language
 pg_catalog | french     | configuration for french language
 pg_catalog | german     | configuration for german language
 pg_catalog | hungarian  | configuration for hungarian language
 pg_catalog | italian    | configuration for italian language
 pg_catalog | norwegian  | configuration for norwegian language
 pg_catalog | portuguese | configuration for portuguese language
 pg_catalog | romanian   | configuration for romanian language
 pg_catalog | russian    | configuration for russian language
 pg_catalog | simple     | simple configuration
 pg_catalog | spanish    | configuration for spanish language
 pg_catalog | swedish    | configuration for swedish language
 pg_catalog | turkish    | configuration for turkish language
(16 rows)

psql SHOW default_text_search_config ;
 default_text_search_config 
----------------------------
 pg_catalog.simple
(1 row)

psql \dF+ simple
Text search configuration "pg_catalog.simple"
Parser: "pg_catalog.default"
      Token      | Dictionaries 
-----------------+--------------
 asciihword      | simple
 asciiword       | simple
 email           | simple
 file            | simple
 float           | simple
 host            | simple
 hword           | simple
 hword_asciipart | simple
 hword_numpart   | simple
 hword_part      | simple
 int             | simple
 numhword        | simple
 numword         | simple
 sfloat          | simple
 uint            | simple
 url             | simple
 url_path        | simple
 version         | simple
 word            | simple

The simple dictionary is described at https://www.postgresql.org/docs/9.6/static/textsearch-dictionaries.html#TEXTSEARCH-SIMPLE-DICTIONARY . The tests assume that in the default configuration the english_stem dictionary is used. However simple is the default configuration for unconfigured PG: https://www.postgresql.org/docs/9.6/static/runtime-config-client.html#GUC-DEFAULT-TEXT-SEARCH-CONFIG .

comment:3 by Tim Graham, 6 years ago

Summary: tests.postgres_tests.test_search.SimpleSearchTest: to_tsvector/plainto_tsquery need 'english' as first parameterSkip postgres_tests's SimpleSearchTest's that require 'english_stem' configuration if the machine's configuration is different
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Skipping the tests is an option. Feel free to propose something else.

comment:4 by Дилян Палаузов, 6 years ago

I propose passing explicitly 'english' as first parameter to both of to_tsvector/plainto_tsquery:

diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
index b93077f..b721f1f 100644
--- a/tests/postgres_tests/test_search.py
+++ b/tests/postgres_tests/test_search.py
@@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase):
         self.assertSequenceEqual(searched, [self.verse1])
 
     def test_non_exact_match(self):
-        searched = Line.objects.filter(dialogue__search='hearts')
+        searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('hearts', config='english'))
         self.assertSequenceEqual(searched, [self.verse2])
 
     def test_search_two_terms(self):
-        searched = Line.objects.filter(dialogue__search='heart bowel')
+        searched = Line.objects.annotate(search=SearchVector('dialogue', config='english')).filter(search=SearchQuery('heart bowel', config='english'))
         self.assertSequenceEqual(searched, [self.verse2])
 
     def test_search_two_terms_with_partial_match(self):

comment:5 by Tim Graham, 6 years ago

The test needs to test __search.

comment:6 by Дилян Палаузов, 6 years ago

I don't know. Take the words as they are: "heart" and "bowels":

diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
--- a/tests/postgres_tests/test_search.py
+++ b/tests/postgres_tests/test_search.py
@@ -88,11 +88,11 @@ class SimpleSearchTest(GrailTestData, PostgreSQLTestCase):
         self.assertSequenceEqual(searched, [self.verse1])
 
     def test_non_exact_match(self):
-        searched = Line.objects.filter(dialogue__search='hearts')
+        searched = Line.objects.filter(dialogue__search='heart')
         self.assertSequenceEqual(searched, [self.verse2])
 
     def test_search_two_terms(self):
-        searched = Line.objects.filter(dialogue__search='heart bowel')
+        searched = Line.objects.filter(dialogue__search='heart bowels')
         self.assertSequenceEqual(searched, [self.verse2])

comment:7 by Pablo Nicolas Estevez, 17 months ago

Owner: set to Pablo Nicolas Estevez
Status: newassigned

Hi, i will try to solve the problem.

comment:8 by Pablo Nicolas Estevez, 17 months ago

Resolution: fixed
Status: assignedclosed

The config in postgresql.config require:
default_text_search_config = 'pg_catalog.english'
In case the config has another languaje i writed a pull request to skip the test
https://github.com/django/django/pull/16357

comment:9 by Tim Graham, 17 months ago

Has patch: set
Resolution: fixed
Status: closednew

The ticket isn't closed until the fix is committed.

comment:10 by Mariusz Felisiak, 17 months ago

Patch needs improvement: set
Status: newassigned

comment:11 by Mariusz Felisiak, 17 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 17 months ago

Resolution: fixed
Status: assignedclosed

In e673c87:

Fixed #29084 -- Skipped some postgres_tests.test_search tests when pg_catalog isn't English.

Note: See TracTickets for help on using tickets.
Back to Top