Opened 21 months ago

Closed 20 months ago

Last modified 20 months ago

#34459 closed Bug (fixed)

SearchVector() can return query strings that are unsafe to combine.

Reported by: Patryk Zawadzki Owned by: Mariusz Felisiak
Component: contrib.postgres Version: 4.2
Severity: Release blocker Keywords:
Cc: Florian Apolloner, Daniele Varrazzo, Simon Charette, Natalia Bidart Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

As the function specifically calls connection.ops.compose_sql, the returned sql will have all parameters inlined.

An unintended consequence is that if you pass it a value that contains a percent sign, like Value("10% OFF"), the resulting sql will have the % character inlined. Such values will result in a ProgrammingError as soon as you attempt to combine the SearchVector with any expression that relies on params.

Depending on whether you use psycopg2 or psycopg 3, the resulting error will tell you that there are not enough params to format the query template or that there is an unescaped % in the query.

Change History (18)

comment:1 by David Sanders, 21 months ago

Triage Stage: UnreviewedAccepted

Tentatively accepting, I'm seeing errors but only when the args to SearchVector are outside that which is documented to accept.

Felix & Simon will likely be able to clear things up.

So, to clarify, is this the behaviour you're seeing?

There's no issue if you filter by the search vector like so:

Foo.objects.annotate(search=SearchVector("text")).filter(search=Value("10% off"))

It's when one of the arguments to SearchVector contain a percent that it becomes an issue (the params weight & config aren't meant to contain %):

Foo.objects.annotate(search=SearchVector("text", config="%")).filter(search=Value("10% off"))

...

django/db/models/query.py:373: in __repr__
    data = list(self[: REPR_OUTPUT_SIZE + 1])
django/db/models/query.py:397: in __iter__
    self._fetch_all()
django/db/models/query.py:1883: in _fetch_all
    self._result_cache = list(self._iterable_class(self))
django/db/models/query.py:90: in __iter__
    results = compiler.execute_sql(
django/db/models/sql/compiler.py:1560: in execute_sql
    cursor.execute(sql, params)
django/db/backends/utils.py:67: in execute
    return self._execute_with_wrappers(
django/db/backends/utils.py:80: in _execute_with_wrappers
    return executor(sql, params, many, context)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <django.db.backends.utils.CursorWrapper object at 0x106e1bb20>
sql = 'SELECT "ticket_34459_foo"."id", "ticket_34459_foo"."text", to_tsvector(\'%\'::regconfig, COALESCE("ticket_34459_foo"....tsvector(\'%\'::regconfig, COALESCE("ticket_34459_foo"."text", \'\')) @@ (plainto_tsquery(%s::regconfig, %s)) LIMIT 21'
params = ('%', '10% off')
ignored_wrapper_args = (False, {'connection': <DatabaseWrapper vendor='postgresql' alias='default'>, 'cursor': <django.db.backends.utils.CursorWrapper object at 0x106e1bb20>})

    def _execute(self, sql, params, *ignored_wrapper_args):
        self.db.validate_no_broken_transaction()
        with self.db.wrap_database_errors:
            if params is None:
                # params default might be backend specific.
                print(sql)
                return self.cursor.execute(sql)
            else:
>               return self.cursor.execute(sql, params)
E               IndexError: tuple index out of range

django/db/backends/utils.py:90: IndexError

Version 1, edited 21 months ago by David Sanders (previous) (next) (diff)

comment:2 by Mariusz Felisiak, 21 months ago

Cc: Florian Apolloner Daniele Varrazzo added
Severity: NormalRelease blocker
Summary: contrib.postgres.search.SearchVector.as_sql can return query strings that are unsafe to combineSearchVector() can return query strings that are unsafe to combine.

comment:3 by Patryk Zawadzki, 21 months ago

David, I’m not sure if it’s “supported” or not, but my project uses SearchVector with expressions such as Value instead of column names so I see a lot of percent signs passed as the corpus of text, not the config or weight. We calculate some of the values we index in Python code and combine strings with different weights.

comment:4 by Simon Charette, 21 months ago

Cc: Simon Charette added

comment:5 by Mariusz Felisiak, 21 months ago

We use compose_sql() in different places. Unfortunately, it's sometimes used together with uncomposed SQL statements causing issues when binding parameters. IMO, we should escape % when we know that the parameters are bound later, e.g.

  • django/contrib/postgres/search.py

    diff --git a/django/contrib/postgres/search.py b/django/contrib/postgres/search.py
    index 4e370aa167..279a39e80e 100644
    a b class SearchVector(SearchVectorCombinable, Func):  
    146146
    147147        # These parameters must be bound on the client side because we may
    148148        # want to create an index on this expression.
    149         sql = connection.ops.compose_sql(sql, config_params + params + extra_params)
     149        sql = connection.ops.compose_sql(
     150            sql, config_params + params + extra_params, quote_params=True
     151        )
    150152        return sql, []
    151153
    152154
    class SearchHeadline(Func):  
    321323        if self.options:
    322324            options_params.append(
    323325                ", ".join(
    324                     connection.ops.compose_sql(f"{option}=%s", [value])
     326                    connection.ops.compose_sql(
     327                        f"{option}=%s", [value], quote_params=True
     328                    )
    325329                    for option, value in self.options.items()
    326330                )
    327331            )
  • django/db/backends/postgresql/operations.py

    diff --git a/django/db/backends/postgresql/operations.py b/django/db/backends/postgresql/operations.py
    index 18cfcb29cb..8aa8bb5173 100644
    a b class DatabaseOperations(BaseDatabaseOperations):  
    201201            return name  # Quoting once is enough.
    202202        return '"%s"' % name
    203203
    204     def compose_sql(self, sql, params):
     204    def quote_value(self, value):
     205        if isinstance(value, str):
     206            value = value.replace("%", "%%")
     207        return value
     208
     209    def compose_sql(self, sql, params, quote_params=False):
     210        if quote_params and params:
     211            params = [self.quote_value(param) for param in params]
    205212        return mogrify(sql, params, self.connection)
    206213
    207214    def set_time_zone_sql(self):

Patryk, does it work for you?

comment:6 by David Sanders, 21 months ago

I've confirmed Simon's patch works with this test:

  • tests/postgres_tests/test_search.py

    diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py
    index 6ec20c0654..9b5b841539 100644
    a b class SearchVectorFieldTest(GrailTestData, PostgreSQLTestCase):  
    161161        )
    162162        self.assertNotIn("COALESCE(COALESCE", str(searched.query))
    163163
     164    def test_values_with_percent(self):
     165        searched = Line.objects.annotate(
     166            search=SearchVector(Value("This week everything is 10% off"))
     167        ).filter(search="10 % off")
     168        self.assertEqual(len(searched), 9)
     169
    164170
    165171class SearchConfigTests(PostgreSQLSimpleTestCase):
    166172    def test_from_parameter(self):

in reply to:  6 comment:7 by Mariusz Felisiak, 21 months ago

Replying to David Sanders:

I've confirmed Simon's patch works with this test

Thanks :+1: I'm not Simon ;)

comment:8 by Phil Gyford, 21 months ago

This is probably the same issue, but wanted to check it wasn't only tangentially related and so missed.

If I have a model with a field like:

search_document = SearchVectorField(null=True)

Then I try to update that for object 123 by doing:

MyModel.objects.filter(pk=123).update(search_document=SearchVector(Value('10%')))

then with psycopg2 I get "IndexError: tuple index out of range".

and with psycopg3 I get "ProgrammingError: only '%s', '%b', '%t' are allowed as placeholders, got '%"

If I omit the % then there's no error.

comment:9 by Florian Apolloner, 21 months ago

I am a bit confused, shouldn't compose_sql (and then mogrify) do that automatically? ie is this a psycopg bug?

comment:10 by Florian Apolloner, 21 months ago

Unfortunately, it's sometimes used together with uncomposed SQL statements causing issues when binding parameters. IMO, we should escape % when we know that the parameters are bound later, e.g.

Ugh, we should probably not use compose_sql then

comment:11 by Natalia Bidart, 21 months ago

Cc: Natalia Bidart added

in reply to:  10 comment:12 by Mariusz Felisiak, 21 months ago

Replying to Florian Apolloner:

Unfortunately, it's sometimes used together with uncomposed SQL statements causing issues when binding parameters. IMO, we should escape % when we know that the parameters are bound later, e.g.

Ugh, we should probably not use compose_sql then

Escaping % sounds like a reasonable compromise to me, we already do this in schema editors, e.g. in MySQL or PostgreSQL. Also, we avoid side-effects by providing the quote_params argument.

Last edited 21 months ago by Mariusz Felisiak (previous) (diff)

comment:13 by Florian Apolloner, 21 months ago

Yes, but the schema editor has generally been a thing where we know we do suboptimal things and we have the generated SQL under control. Here we have (in the worst case) user input and it simply feels like opening a can of worms if we are not able to distinguish between parameters and the sql iteself clearly.

in reply to:  13 comment:14 by Mariusz Felisiak, 21 months ago

Replying to Florian Apolloner:

Yes, but the schema editor has generally been a thing where we know we do suboptimal things and we have the generated SQL under control. Here we have (in the worst case) user input and it simply feels like opening a can of worms if we are not able to distinguish between parameters and the sql iteself clearly.

Alternatively, we can remove compose_sql() from SearchVector (it was added in f83ee2a23ee28a271a50a005b62259d735fbea3f) but looks unnecessary now, see draft PR. We can add test for an index with SearchVector to confirm this.

comment:15 by Florian Apolloner, 21 months ago

Lovely, if that passes the testsuite that would be certainly preferred. Might be that follow up changes made this unnecessary

comment:16 by Mariusz Felisiak, 21 months ago

Has patch: set
Owner: set to Mariusz Felisiak
Status: newassigned

comment:17 by Mariusz Felisiak <felisiak.mariusz@…>, 20 months ago

Resolution: fixed
Status: assignedclosed

In 4bf4222:

Fixed #34459 -- Fixed SearchVector() crash for parameters with % symbol.

Thanks Patryk Zawadzki for the report.

Regression in 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca.

comment:18 by Mariusz Felisiak <felisiak.mariusz@…>, 20 months ago

In db49def5:

[4.2.x] Fixed #34459 -- Fixed SearchVector() crash for parameters with % symbol.

Thanks Patryk Zawadzki for the report.

Regression in 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca.

Backport of 4bf4222010fd8e413963c6c873e4088614332ef9 from main

Note: See TracTickets for help on using tickets.
Back to Top