Opened 11 months ago

Last modified 7 months ago

#27732 new Bug

django.contrib.postgres.search SearchRank doesn't handle SearchVectorField references

Reported by: Matt Hoskins Owned by:
Component: contrib.postgres Version: master
Severity: Normal Keywords:
Cc: Marc Tamlyn Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

For search performance I'm using SearchVectorField on a model. However if I reference that field in the vector argument to SearchRank I get an error. This is because SearchRank always wraps a vector argument that lacks a "resolve_expression" attribute as a SearchVector even if it would end up resolving to a SearchVectorField.

If I modify tests/postgres_tests/test_search.py and add the following to the SearchVectorFieldTest class:

    def test_existing_vector_ranking(self):
        Line.objects.update(dialogue_search_vector=SearchVector('dialogue'))
        searched = Line.objects.filter(character=self.minstrel).annotate(
            rank=SearchRank('dialogue_search_vector', SearchQuery('brave sir robin')),
        ).order_by('rank')
        self.assertSequenceEqual(searched, [self.verse2, self.verse1, self.verse0])

Then this results in the following error:

ProgrammingError: function to_tsvector(tsvector) does not exist
LINE 1: ... "postgres_tests_line"."dialogue_config", ts_rank(to_tsvecto...
                                                             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

As the enforced wrapping of the vector argument in a SearchVector is then causing the sql to wrap the tsvector field in a coalesce (which doesn't cause an issue) and then a to_tsvector call.

I've not delved into the Expressions code before (my last rummaging around in the bowels of the query code was a few years ago) but from my quick look at existing code and the APIs then perhaps a solution is for SearchRank to not wrap the vector argument as it comes into init but instead have a resolve_expression call which checks the first resolved expression on the result of calling super's resolve_expression is or isn't of type SearchVectorField and then if it isn't then resolve a wrapped version.

I've attached a patch which both adds the test I mention above (it may be that it should live in the rank test rather than the search vector field test, or have its own test class which covers the combination of both) and also modifies SearchRank in the way I've suggested above. The postgres tests pass on my test system with this patch.

Attachments (1)

searchrankvectorfieldfix.diff (2.2 KB) - added by Matt Hoskins 11 months ago.

Download all attachments as: .zip

Change History (7)

Changed 11 months ago by Matt Hoskins

comment:1 Changed 11 months ago by Simon Charette

Cc: Marc Tamlyn added
Version: 1.10master

Marc, any thoughts on that?

comment:2 Changed 11 months ago by Marc Tamlyn

I had to make a choice as to the "default" behaviour of SearchRank (and for that matter SearchVector). I made the latter match the former - so that the query SearchRank('body_text', SearchQuery('django')) worked. I think supporting the simple string version in both cases (when it is or is not a SearchVector) already is somewhat confusing personally, especially as I believe the "work around" is:

SearchRank(F('dialogue_search_vector'), SearchQuery('brave sir robin'))

comment:3 Changed 11 months ago by Matt Hoskins

The example for SearchRank in the documentation does only shows passing in a SearchVector() as the first argument to SearchRank and a SearchQuery() for the second so I was a little surprised when looking at the code to find it was behaving such that if you passed in a string for the first argument that it assumed you'd just missed off wrapping it in a SearchVector() rather than it potentially being a reference to a SearchVectorField.

If SearchRank were left as is (i.e. not supporting passing in a string which references a SearchVectorField) then I think it would be useful for the documentation for SearchVectorField to set out the F() "work around" as the way to use a SearchVectorField in SearchRank. E.g. just have a bit saying "To use a SearchVectorField in SearchRank then do..." and given an example too. It takes a little bit of figuring out otherwise (particularly as the documentation makes no mention that you can pass in a string as the first argument and it'll wrap it in a SearchVector for you) and I have to admit wrapping it in F() to prevent that behaviour is not a leap I'd made yet as most database functions in django just take field references as strings if the type of the field is already suitable.

comment:4 Changed 11 months ago by Matt Hoskins

After further musing and taking into account:

  • The documentation describes passing in a vector and a query into SearchRank
  • The example included with the above shows passing in SearchVector and a SearchQuery objects only (and not strings)
  • Marc's point "I think supporting the simple string version in both cases (when it is or is not a SearchVector) already is somewhat confusing personally"
  • My feeling that the intuitive way to reference a SearchVectorField as the first argument to SearchRank would be to pass in a string (given other functions, like the DateTime ones, accept a string if you want to reference a field of the correct type, although I note that they do list their first argument as "expression").
  • The DateTime functions use resolve_expression to validate the type that their first argument resolves into
  • It's possible in postgresql to have a column of type tsquery - I can't think why anyone would ever want to (no use case seems likely), but there would be some logical consistency in allowing the second argument of SearchRank to be a reference to a related query column via the potential to implement SearchQueryField

... then one course of action that springs to mind is to deprecate passing in strings to SearchRank except where a string for vector resolves into a SearchVectorField:

  • Perhaps apply my patch in a point release (although it's possible, as Marc suggested, to work around using "F()" for now)
  • In the next version (minor/major) use my patch but also emit a deprecation warning if the resolving of the vector argument doesn't result in a SearchVectorField type result and also emit a deprecation warning if the query argument isn't a SearchQuery
  • For the documentation of SearchVectorField perhaps mention that a string can be passed in to reference one in SearchRank to make it clear that's how to do it
  • Once deprecation is complete perhaps check the types of what the vector and query arguments resolve into and, like with the DateTime functions, raise ValueError if they don't resolve into the correct type.

There may be good reasons for adopting a different approach of course (such as the simplest case of just documenting Marc's "work around" as the official way to reference a SearchVectorField as the vector argument to SearchRank), these are just my musings!

comment:5 Changed 11 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

Tentatively accepting, even though it seems some design decisions remain. If no code changes are made, perhaps the documentation could be clarified.

comment:6 Changed 7 months ago by Andrii Soldatenko

What's the next steps, because I've found also this issue, and I reinvent the same workaround as Marc showed with wrapping using F.
I think we should at least fix documentation to not confuse users or fix behavior. Any thoughts?

Note: See TracTickets for help on using tickets.
Back to Top