#23423 closed New feature (fixed)
Integrate unaccent lookups in django.contrib.postgres
Reported by: | Thomas C | Owned by: | Thomas C |
---|---|---|---|
Component: | contrib.postgres | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
After the introduction of the Lookup
API in Django 1.7, I believe that django.contrib.postgres
could benefit from some custom lookups based on the unaccent extension of PostgreSQL (see http://www.postgresql.org/docs/9.3/static/unaccent.html). That kind of feature would be very helpful for a lot of people in latin countries, and the existing solutions seems not very clean (and do not make use of the new API).
Change History (12)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|---|
Has patch: | set |
comment:2 by , 10 years ago
Marc, do you think this belong to django.contrib.postgres, or would you prefer to push it to a third-party app?
Either option would work for me.
comment:3 by , 10 years ago
I should mention that I'm not quite satisfied with the implementation in the commit above. The current lookup API in Django does not allow for real accent insensitive lookup. More generally, it does not allow for any *SQL transformation*-insensitive search as the transformation is only applied to the left-hand side of the lookup, never to the right side.
I discussed that with akaariai and made a proposition for bilateral transformations - tracked in #23493 - which I believe is a prerequisite for implementing robust accent-insensitive lookups in Django.
comment:4 by , 10 years ago
Sounds like a reasonable addition to d.c.postgres to me. I agree it needs a bilateral transform, which also sounds like a good feature.
comment:5 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:6 by , 10 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
Marking as "patch needs improvement" (pending bilateral transform) and "needs documentation".
comment:7 by , 10 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:8 by , 10 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Documentation proofreading from a native speaker wouldn't hurt but the patch looks good to me.
comment:9 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:10 by , 6 years ago
bilaterality of this function maybe should be explained in the documentation.
also there is a caveat, and i want to comment it here, as this implementation caused me to debug for a while:
if i do (please note how the sequence of enclosing is for unaccent and lower, also what follows are just abstract examples, i know there are possible workarounds!)
Model.objects.annotate(a_value=Unaccent(Lower('fieldname'))).filter( a_value=Lower( 'another_field_or_Value' ) )
you would expect (abstractly) something like this SQL:
SELECT * WHERE unaccent(lower(fieldname)) = lower(another_field_or_Value);
however what you get is (because of bilateral "magic"):
SELECT * WHERE unaccent(lower(fieldname)) = lower(unaccent(another_field_or_Value))
less of a problem, as most likely you would want anyway something like
Model.objects.annotate(a_value=Unaccent(Lower('fieldname'))).filter( a_value=Unaccent(Lower( 'another_field_or_Value' )) )
however this expression results in django as
SELECT * WHERE unaccent(lower(fieldname)) = unaccent(lower(unaccent(another_field_or_Value)))
this would not make much issues, you can just leave out unaccent on RHS as you will get it from annotate, especially if lower and unaccent were interchangable in their position
but they are not, there are exceptions
see in postgres:
select lower(unaccent('ÇÁÈÎËßÖÜ')), unaccent(lower('ÇÁÈÎËßÖÜ'));
results in "caeiesou", "caeieSou"
and these are just the european exceptions i checked, but be warned.
comment:11 by , 6 years ago
Gabor, thanks for your input. It would be great to open a separate ticket with your findings, ideally with a failing test case for the Django test suite.
comment:12 by , 6 years ago
well since this writing further research revealed, that postgres 9.6 has fixed the issue (at least for ß). so it's only valid for 9.5 series and lower. i am therefore not sure if its worth a new ticket.
I tried something using
Tranform
, as it looks like it was made for that kind of things: https://github.com/tchaumeny/django/commit/cb3d7808c20094636bacb0576bc738ee466d256a