Opened 7 years ago

Last modified 9 hours ago

#21181 assigned New feature

collation specific query results ordering

Reported by: alan.kesselmann@… Owned by: Tom Carrick
Component: Database layer (models, ORM) Version:
Severity: Normal Keywords: ORM
Cc: Tom Carrick, Alexandr Artemyev, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There has been feature request for it in 2012 in django-developers group : https://groups.google.com/forum/#!topic/django-developers/0iESVnawNAY. Before finding that request i asked about same thing in stackoverflow: http://stackoverflow.com/questions/18935712/queryset-sorting-specifying-column-collation-for-django-orm-query. Both Anssi (https://groups.google.com/d/msg/django-developers/0iESVnawNAY/JefMfAm7nQMJ) and me outline how it could be used im same way:

.order_by([fieldnames], collation = 'et_EE')

another version of it could be automatical, based on django language settings. Model often has ordering set in its Meta. So why not include automatical ordering there too.

Is it possible to include this into some of next versions of Django?

Some background

If django is used to power some international site like a newspaper then there is rarely need for sorting data based on sorting rules of language other than english. If you use it to power a site that hosts data in several languages then you have need to also display data in that language properly. Sorting based on English alphabet does not work correctly in that case cause letters missing from english alphabet (and there are MANY of those letters that exist in swedish, danish, german, not to mention estonian, russian etc. alphabets) are displayed in completely wrong places.

Again - most django sites will be probably using single language. But there are many cases that will benefit from this feature. Like e-commerce modules, that translate product names - they can benefit from it. Any site that allows language based naming or translating of their objects will benefit from this feature.

Change History (16)

comment:1 Changed 7 years ago by Anssi Kääriäinen

I agree that some solution to this is a good idea. There are some tickets that are close to this request, #14310 and #9682 come to mind. I didn't find a perfect duplicate so accepting this.

Custom lookups could offer a neat solution to this. You should be able to do: qs.alias(name_fi=Collate('name', 'fi')).order_by('name_fi'). Or, to use collated filtering, use qs.alias(name_fi=Collate('name', 'fi')).filter(name_fi__icontains='someval'). See #16187 for details. The patch in that ticket already offer all the machinery needed to do this, but there is still a lot of work before merge is possible.

While waiting for custom lookups maybe there should be some other way to achieve this.

comment:2 Changed 7 years ago by Anssi Kääriäinen

Triage Stage: UnreviewedAccepted

comment:3 Changed 7 years ago by alan.kesselmann@…

I think there is no duplicate to this, as this is relatively new thing. #14310 is similar cause it requests per query collation overrides but the #9682, which is marked as #14310's duplicate is another thing completely.

In my test (or investigation) case i created two postgresql databases. One with default en_US.utf8 collation and one with POSIX collation. Since one table can hold data which is in several languages you (in this case i) have to be able to add collation as a part of query. Like this:

SELECT * FROM <tablename> ORDER BY <textcolum> COLLATE "<your_collation>", <integercolumn>
or
SELECT <textcolumn> COLLATE "<your_collation>" FROM <tablename> ORDER BY <textcolumn>

What this allows to achieve is that table of strings like:
id, name
1, älan
2, alan
3, õlan

will be sorted with query "SELECT * FROM table ORDEr BY name" to
2, alan
1, älan
3, õlan
with en_US.utf8 collation and to
2, alan
3, õlan
1, älan
with estonian alphabet based collation.

Your example, qs.alias(name_fi=Collate('name', 'fi')).order_by('name_fi') is the best i think (for my use-case anyway), as it pictures the core thing that i need - pair of column name ("name") and collation ("fi"). The Collate object seems like a good idea that might allow some more complex use cases too (http://www.postgresql.org/docs/9.1/static/collation.html lists several).

comment:4 Changed 7 years ago by Anssi Kääriäinen

I tried how hard this would be to implement using custom lookups. Turns out it was somewhat straightforward, see: https://github.com/akaariai/django/commit/76063881d79732cbb1942e9905fe4d51c0df09c1

The example works only on PostgreSQL 9.1 plus, and the test works only if collations fi_FI, en_GB and POSIX are available (I believe the last one is always there). Of course, adding in full multidb support would require more work so that the SQL would actually work on all databases.

comment:5 Changed 7 years ago by alan.kesselmann@…

Hmm i can see you have done loads of work in your custom lookups branch. Any idea which version those changes might end up in?

comment:6 Changed 5 years ago by Anssi Kääriäinen

Now that expressions are in, a Collate expression would be a very welcome addition to Django. It should work on all databases that support collations, not just on PostgreSQL.

comment:7 Changed 4 years ago by Ian Foote

Owner: changed from nobody to Ian Foote
Status: newassigned

comment:8 Changed 4 years ago by Ian Foote

Owner: Ian Foote deleted
Status: assignednew

comment:9 Changed 4 years ago by Alan

Owner: set to Alan
Status: newassigned

comment:10 Changed 4 months ago by Simon Charette

Now that order_by has proper expression support this is pretty straightforward to implement using an Expression of Func subclass.

I guess this ticket could be repurposed to allow a collation argument to be passed to OrderBy or introduce a backend agnostic Collate expression.

order_by(F('title').desc(collation='fr_FR')) or order_by(Collate('title', 'fr_FR').desc())

I think a Collate expression would make more sense since it could be used in lookups as well filter(title=Collate(Value(title), 'fr_FR')).

comment:11 Changed 6 days ago by Tom Carrick

I made a quick attempt at implementing Collate() as an expression. I've hit a problem though, as it seems like SQLIte doesn't allow you to use parameters with COLLATE:

In [34]: with connection.cursor() as cursor: 
    ...:     cursor.execute("select name from testapp_foo order by %s collate %s", ["name", "nocase"]) 
    ...:     row = cursor.fetchall()

...

OperationalError: near "?": syntax error

I also have problems on postgres (haven't tried with other backends):

In [16]: with connection.cursor() as cursor: 
    ...:     cursor.execute("select name from testapp_foo order by %s collate %s", ["name", "en_US"]) 
    ...:     row = cursor.fetchone()

...

ProgrammingError: syntax error at or near "'en_US'"
LINE 1: select name from testapp_foo order by 'name' collate 'en_US'

Not exactly sure what's going on here, but it looks like single quotes don't do it here. It works if I remove the placeholders and set it with "en_US" - with double quotes.

Maybe I'm misunderstanding something.

comment:12 Changed 5 days ago by Tom Carrick

Cc: Tom Carrick added

comment:13 Changed 3 days ago by Alexandr Artemyev

Cc: Alexandr Artemyev added

comment:14 Changed 2 days ago by Simon Charette

Cc: Simon Charette added

Looks like collation names are identifiers on these backends and need to be escaped using connection.ops.quote_name. Was MySQL also behaving this way?

comment:15 Changed 12 hours ago by Tom Carrick

Owner: changed from Alan to Tom Carrick

comment:16 Changed 9 hours ago by Tom Carrick

Has patch: set
Note: See TracTickets for help on using tickets.
Back to Top