Opened 23 months ago

Last modified 3 months ago

#21181 new New feature

collation specific query results ordering

Reported by: alan.kesselmann@… Owned by: nobody
Component: Database layer (models, ORM) Version:
Severity: Normal Keywords: ORM
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There has been feature request for it in 2012 in django-developers group : https://groups.google.com/forum/#!topic/django-developers/0iESVnawNAY. Before finding that request i asked about same thing in stackoverflow: http://stackoverflow.com/questions/18935712/queryset-sorting-specifying-column-collation-for-django-orm-query. Both Anssi (https://groups.google.com/d/msg/django-developers/0iESVnawNAY/JefMfAm7nQMJ) and me outline how it could be used im same way:

.order_by([fieldnames], collation = 'et_EE')

another version of it could be automatical, based on django language settings. Model often has ordering set in its Meta. So why not include automatical ordering there too.

Is it possible to include this into some of next versions of Django?

Some background

If django is used to power some international site like a newspaper then there is rarely need for sorting data based on sorting rules of language other than english. If you use it to power a site that hosts data in several languages then you have need to also display data in that language properly. Sorting based on English alphabet does not work correctly in that case cause letters missing from english alphabet (and there are MANY of those letters that exist in swedish, danish, german, not to mention estonian, russian etc. alphabets) are displayed in completely wrong places.

Again - most django sites will be probably using single language. But there are many cases that will benefit from this feature. Like e-commerce modules, that translate product names - they can benefit from it. Any site that allows language based naming or translating of their objects will benefit from this feature.

Change History (6)

comment:1 Changed 23 months ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I agree that some solution to this is a good idea. There are some tickets that are close to this request, #14310 and #9682 come to mind. I didn't find a perfect duplicate so accepting this.

Custom lookups could offer a neat solution to this. You should be able to do: qs.alias(name_fi=Collate('name', 'fi')).order_by('name_fi'). Or, to use collated filtering, use qs.alias(name_fi=Collate('name', 'fi')).filter(name_fi__icontains='someval'). See #16187 for details. The patch in that ticket already offer all the machinery needed to do this, but there is still a lot of work before merge is possible.

While waiting for custom lookups maybe there should be some other way to achieve this.

comment:2 Changed 23 months ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

comment:3 Changed 23 months ago by alan.kesselmann@…

I think there is no duplicate to this, as this is relatively new thing. #14310 is similar cause it requests per query collation overrides but the #9682, which is marked as #14310's duplicate is another thing completely.

In my test (or investigation) case i created two postgresql databases. One with default en_US.utf8 collation and one with POSIX collation. Since one table can hold data which is in several languages you (in this case i) have to be able to add collation as a part of query. Like this:

SELECT * FROM <tablename> ORDER BY <textcolum> COLLATE "<your_collation>", <integercolumn>
or
SELECT <textcolumn> COLLATE "<your_collation>" FROM <tablename> ORDER BY <textcolumn>

What this allows to achieve is that table of strings like:
id, name
1, älan
2, alan
3, õlan

will be sorted with query "SELECT * FROM table ORDEr BY name" to
2, alan
1, älan
3, õlan
with en_US.utf8 collation and to
2, alan
3, õlan
1, älan
with estonian alphabet based collation.

Your example, qs.alias(name_fi=Collate('name', 'fi')).order_by('name_fi') is the best i think (for my use-case anyway), as it pictures the core thing that i need - pair of column name ("name") and collation ("fi"). The Collate object seems like a good idea that might allow some more complex use cases too (http://www.postgresql.org/docs/9.1/static/collation.html lists several).

comment:4 Changed 23 months ago by akaariai

I tried how hard this would be to implement using custom lookups. Turns out it was somewhat straightforward, see: https://github.com/akaariai/django/commit/76063881d79732cbb1942e9905fe4d51c0df09c1

The example works only on PostgreSQL 9.1 plus, and the test works only if collations fi_FI, en_GB and POSIX are available (I believe the last one is always there). Of course, adding in full multidb support would require more work so that the SQL would actually work on all databases.

comment:5 Changed 22 months ago by alan.kesselmann@…

Hmm i can see you have done loads of work in your custom lookups branch. Any idea which version those changes might end up in?

comment:6 Changed 3 months ago by akaariai

Now that expressions are in, a Collate expression would be a very welcome addition to Django. It should work on all databases that support collations, not just on PostgreSQL.

Note: See TracTickets for help on using tickets.
Back to Top