Opened 8 years ago

Closed 8 years ago

#27227 closed Bug (needsinfo)

Full text search by UUIDField returns DataError

Reported by: Dan Claudiu Pop Owned by: nobody
Component: Database layer (models, ORM) Version: 1.10
Severity: Normal Keywords: UUIDField fts postgres
Cc: danclaudiupop@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following model:

class Show(models.Model):
    cid = models.UUIDField(
        default=uuid.uuid4,
        editable=False,
        verbose_name="Content ID",
        help_text="Unique Identifier"
    )
    title_short = models.CharField(
        max_length=60,
        blank=True,
        verbose_name="Short Title",
        help_text="Short title (60 chars)"
    )

The following snippet returns DataError, see below traceback:

>>> from django.contrib.postgres.search import SearchVector
>>> Show.objects.annotate(search=SearchVector('cid')).filter(search='foo')

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 232, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 256, in __iter__
    self._fetch_all()
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 1087, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 54, in __iter__
    results = compiler.execute_sql()
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 835, in execute_sql
    cursor.execute(sql, params)
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/utils/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
django.db.utils.DataError: invalid input syntax for uuid: ""
LINE 1: ...tag", to_tsvector(COALESCE("entities_show"."cid", '')) AS "s...

Tried with PostgreSQL 9.3.14 and PostgreSQL 9.5.3

Change History (10)

comment:1 by Dan Claudiu Pop, 8 years ago

Cc: danclaudiupop@… added

comment:2 by Dan Claudiu Pop, 8 years ago

Type: UncategorizedBug

comment:3 by Melvyn Sopacua, 8 years ago

Analysis:

In django/contrib/postgres/search.py we find in init() around line 55 (stable/1.10.x):

        self.source_expressions = [
            Coalesce(expression, Value('')) for expression in self.source_expressions
        ]

These are propagated to Coalesce() and generate the expression:

COALESCE("entities_show"."cid", '')

For most cases, the empty string is a valid input value, yet for the native PostgreSQL UUID type, this is not the case. The API doesn't provide a way to specify the fallback value.

comment:4 by Tim Graham, 8 years ago

What's the use case for running full text search on a UUID?

comment:5 by Dan Claudiu Pop, 8 years ago

We have a system in which we identify a resource by uuid field. We also expose a search endpoint where we give back a list of different resource types. For commodity we also search in the uuid field to give back the proper resource on the search endpoint.

Instead of checking "/api/book/<uuid>/" or "/api/author/<uuid>/" for an <uuid> we can go directly to "api/search/?q=<uuid>" to fetch the resource.

comment:6 by Simon Charette, 8 years ago

We could be extra safe and always cast the expression to text in a CAST(expression, output_field=models.TextField()) but I don't understand why you're using FTS here, using LIKE through the __contains lookup would be more appropriate IMHO. In the meant time you can take care of casting the expression yourself before annotating your queryset to work around your issue.

comment:7 by Melvyn Sopacua, 8 years ago

So unfortunately, even when supporting the fallback value, this fails because to_tsvector(uuid) fails for reasons pointed out by charettes: uuid isn't searchable. However, having a way to provide the fallback is useful, if one wants to have NULL returned. From the Coalesce docs:

The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null.

But, I don't have a real-world use case for this. If there's no interest in supporting the fallback value, I'll throw away my local branch.

comment:8 by Dan Claudiu Pop, 8 years ago

The use case that we encounter is this: having an administration interface which lists all the shows, i want to search shows by cid, title_short and other fields (such as description, etc). So I wanted to leverage FTS, because __contains is really slow over many records and fields.

By charettes suggestion, I used CAST and the queryset looks like this:

Show.objects.annotate(cidtxt=Cast('cid', TextField())).annotate(search=SearchVector('cidtxt', 'title_short')).filter(search='foo')

comment:9 by Tim Graham, 8 years ago

I'm not sure. Maybe you can send a pull request to give an idea of what the patch allows?

comment:10 by Tim Graham, 8 years ago

Resolution: needsinfo
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top