Opened 9 years ago
Closed 9 years ago
#27227 closed Bug (needsinfo)
Full text search by UUIDField returns DataError
| Reported by: | Dan Claudiu Pop | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.10 |
| Severity: | Normal | Keywords: | UUIDField fts postgres |
| Cc: | danclaudiupop@… | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Consider the following model:
class Show(models.Model):
cid = models.UUIDField(
default=uuid.uuid4,
editable=False,
verbose_name="Content ID",
help_text="Unique Identifier"
)
title_short = models.CharField(
max_length=60,
blank=True,
verbose_name="Short Title",
help_text="Short title (60 chars)"
)
The following snippet returns DataError, see below traceback:
>>> from django.contrib.postgres.search import SearchVector
>>> Show.objects.annotate(search=SearchVector('cid')).filter(search='foo')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 232, in __repr__
data = list(self[:REPR_OUTPUT_SIZE + 1])
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 256, in __iter__
self._fetch_all()
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 1087, in _fetch_all
self._result_cache = list(self.iterator())
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/query.py", line 54, in __iter__
results = compiler.execute_sql()
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 835, in execute_sql
cursor.execute(sql, params)
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/utils.py", line 94, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/home/vagrant/ve/lib/python3.4/site-packages/django/utils/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/vagrant/ve/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
django.db.utils.DataError: invalid input syntax for uuid: ""
LINE 1: ...tag", to_tsvector(COALESCE("entities_show"."cid", '')) AS "s...
Tried with PostgreSQL 9.3.14 and PostgreSQL 9.5.3
Change History (10)
comment:1 by , 9 years ago
| Cc: | added |
|---|
comment:2 by , 9 years ago
| Type: | Uncategorized → Bug |
|---|
comment:3 by , 9 years ago
comment:5 by , 9 years ago
We have a system in which we identify a resource by uuid field. We also expose a search endpoint where we give back a list of different resource types. For commodity we also search in the uuid field to give back the proper resource on the search endpoint.
Instead of checking "/api/book/<uuid>/" or "/api/author/<uuid>/" for an <uuid> we can go directly to "api/search/?q=<uuid>" to fetch the resource.
comment:6 by , 9 years ago
We could be extra safe and always cast the expression to text in a CAST(expression, output_field=models.TextField()) but I don't understand why you're using FTS here, using LIKE through the __contains lookup would be more appropriate IMHO. In the meant time you can take care of casting the expression yourself before annotating your queryset to work around your issue.
comment:7 by , 9 years ago
So unfortunately, even when supporting the fallback value, this fails because to_tsvector(uuid) fails for reasons pointed out by charettes: uuid isn't searchable. However, having a way to provide the fallback is useful, if one wants to have NULL returned. From the Coalesce docs:
The COALESCE function returns the first of its arguments that is not null. Null is returned only if all arguments are null.
But, I don't have a real-world use case for this. If there's no interest in supporting the fallback value, I'll throw away my local branch.
comment:8 by , 9 years ago
The use case that we encounter is this: having an administration interface which lists all the shows, i want to search shows by cid, title_short and other fields (such as description, etc). So I wanted to leverage FTS, because __contains is really slow over many records and fields.
By charettes suggestion, I used CAST and the queryset looks like this:
Show.objects.annotate(cidtxt=Cast('cid', TextField())).annotate(search=SearchVector('cidtxt', 'title_short')).filter(search='foo')
comment:9 by , 9 years ago
I'm not sure. Maybe you can send a pull request to give an idea of what the patch allows?
comment:10 by , 9 years ago
| Resolution: | → needsinfo |
|---|---|
| Status: | new → closed |
Analysis:
In
django/contrib/postgres/search.pywe find in init() around line 55 (stable/1.10.x):self.source_expressions = [ Coalesce(expression, Value('')) for expression in self.source_expressions ]These are propagated to Coalesce() and generate the expression:
COALESCE("entities_show"."cid", '')For most cases, the empty string is a valid input value, yet for the native PostgreSQL UUID type, this is not the case. The API doesn't provide a way to specify the fallback value.