Opened 9 years ago

Closed 9 years ago

Last modified 5 years ago

#23423 closed New feature (fixed)

Integrate unaccent lookups in django.contrib.postgres

Reported by: Thomas C Owned by: Thomas C
Component: contrib.postgres Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Thomas C)

After the introduction of the Lookup API in Django 1.7, I believe that django.contrib.postgres could benefit from some custom lookups based on the unaccent extension of PostgreSQL (see http://www.postgresql.org/docs/9.3/static/unaccent.html). That kind of feature would be very helpful for a lot of people in latin countries, and the existing solutions seems not very clean (and do not make use of the new API).

Change History (12)

comment:1 Changed 9 years ago by Thomas C

Description: modified (diff)
Has patch: set

I tried something using Tranform, as it looks like it was made for that kind of things: https://github.com/tchaumeny/django/commit/cb3d7808c20094636bacb0576bc738ee466d256a

comment:2 Changed 9 years ago by Aymeric Augustin

Marc, do you think this belong to django.contrib.postgres, or would you prefer to push it to a third-party app?

Either option would work for me.

comment:3 Changed 9 years ago by Thomas C

I should mention that I'm not quite satisfied with the implementation in the commit above. The current lookup API in Django does not allow for real accent insensitive lookup. More generally, it does not allow for any *SQL transformation*-insensitive search as the transformation is only applied to the left-hand side of the lookup, never to the right side.

I discussed that with akaariai and made a proposition for bilateral transformations - tracked in #23493 - which I believe is a prerequisite for implementing robust accent-insensitive lookups in Django.

Last edited 9 years ago by Thomas C (previous) (diff)

comment:4 Changed 9 years ago by Marc Tamlyn

Sounds like a reasonable addition to d.c.postgres to me. I agree it needs a bilateral transform, which also sounds like a good feature.

comment:5 Changed 9 years ago by Aymeric Augustin

Triage Stage: UnreviewedAccepted

comment:6 Changed 9 years ago by Tim Graham

Needs documentation: set
Patch needs improvement: set

Marking as "patch needs improvement" (pending bilateral transform) and "needs documentation".

comment:7 Changed 9 years ago by Thomas C

Owner: set to Thomas C
Status: newassigned

comment:8 Changed 9 years ago by Simon Charette

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Documentation proofreading from a native speaker wouldn't hurt but the patch looks good to me.

comment:9 Changed 9 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 17fe0bd808a47f37dd1351adb01a8ad2cc852f24:

Fixed #23423 -- Added unaccent lookup in django.contrib.postgres

comment:10 Changed 5 years ago by Gabor Körber

bilaterality of this function maybe should be explained in the documentation.

also there is a caveat, and i want to comment it here, as this implementation caused me to debug for a while:

if i do (please note how the sequence of enclosing is for unaccent and lower, also what follows are just abstract examples, i know there are possible workarounds!)

    Model.objects.annotate(a_value=Unaccent(Lower('fieldname'))).filter( a_value=Lower( 'another_field_or_Value' ) )

you would expect (abstractly) something like this SQL:

SELECT * WHERE unaccent(lower(fieldname)) = lower(another_field_or_Value);

however what you get is (because of bilateral "magic"):

SELECT * WHERE unaccent(lower(fieldname)) = lower(unaccent(another_field_or_Value))

less of a problem, as most likely you would want anyway something like

    Model.objects.annotate(a_value=Unaccent(Lower('fieldname'))).filter( a_value=Unaccent(Lower( 'another_field_or_Value' )) )

however this expression results in django as

SELECT * WHERE unaccent(lower(fieldname)) = unaccent(lower(unaccent(another_field_or_Value)))

this would not make much issues, you can just leave out unaccent on RHS as you will get it from annotate, especially if lower and unaccent were interchangable in their position
but they are not, there are exceptions

see in postgres:

select lower(unaccent('ÇÁÈÎËßÖÜ')), unaccent(lower('ÇÁÈÎËßÖÜ'));

results in "caeiesou", "caeieSou"
and these are just the european exceptions i checked, but be warned.

comment:11 Changed 5 years ago by Claude Paroz

Gabor, thanks for your input. It would be great to open a separate ticket with your findings, ideally with a failing test case for the Django test suite.

comment:12 Changed 5 years ago by Gabor Körber

well since this writing further research revealed, that postgres 9.6 has fixed the issue (at least for ß). so it's only valid for 9.5 series and lower. i am therefore not sure if its worth a new ticket.

Last edited 5 years ago by Gabor Körber (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top