#21181 closed New feature (fixed)
collation specific query results ordering
Reported by: | Owned by: | Tom Carrick | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | Normal | Keywords: | ORM |
Cc: | Tom Carrick, Alexandr Artemyev, Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
There has been feature request for it in 2012 in django-developers group : https://groups.google.com/forum/#!topic/django-developers/0iESVnawNAY. Before finding that request i asked about same thing in stackoverflow: http://stackoverflow.com/questions/18935712/queryset-sorting-specifying-column-collation-for-django-orm-query. Both Anssi (https://groups.google.com/d/msg/django-developers/0iESVnawNAY/JefMfAm7nQMJ) and me outline how it could be used im same way:
.order_by([fieldnames], collation = 'et_EE')
another version of it could be automatical, based on django language settings. Model often has ordering set in its Meta. So why not include automatical ordering there too.
Is it possible to include this into some of next versions of Django?
Some background
If django is used to power some international site like a newspaper then there is rarely need for sorting data based on sorting rules of language other than english. If you use it to power a site that hosts data in several languages then you have need to also display data in that language properly. Sorting based on English alphabet does not work correctly in that case cause letters missing from english alphabet (and there are MANY of those letters that exist in swedish, danish, german, not to mention estonian, russian etc. alphabets) are displayed in completely wrong places.
Again - most django sites will be probably using single language. But there are many cases that will benefit from this feature. Like e-commerce modules, that translate product names - they can benefit from it. Any site that allows language based naming or translating of their objects will benefit from this feature.
Change History (21)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 11 years ago
I think there is no duplicate to this, as this is relatively new thing. #14310 is similar cause it requests per query collation overrides but the #9682, which is marked as #14310's duplicate is another thing completely.
In my test (or investigation) case i created two postgresql databases. One with default en_US.utf8 collation and one with POSIX collation. Since one table can hold data which is in several languages you (in this case i) have to be able to add collation as a part of query. Like this:
SELECT * FROM <tablename> ORDER BY <textcolum> COLLATE "<your_collation>", <integercolumn>
or
SELECT <textcolumn> COLLATE "<your_collation>" FROM <tablename> ORDER BY <textcolumn>
What this allows to achieve is that table of strings like:
id, name
1, älan
2, alan
3, õlan
will be sorted with query "SELECT * FROM table ORDEr BY name" to
2, alan
1, älan
3, õlan
with en_US.utf8 collation and to
2, alan
3, õlan
1, älan
with estonian alphabet based collation.
Your example, qs.alias(name_fi=Collate('name', 'fi')).order_by('name_fi') is the best i think (for my use-case anyway), as it pictures the core thing that i need - pair of column name ("name") and collation ("fi"). The Collate object seems like a good idea that might allow some more complex use cases too (http://www.postgresql.org/docs/9.1/static/collation.html lists several).
comment:4 by , 11 years ago
I tried how hard this would be to implement using custom lookups. Turns out it was somewhat straightforward, see: https://github.com/akaariai/django/commit/76063881d79732cbb1942e9905fe4d51c0df09c1
The example works only on PostgreSQL 9.1 plus, and the test works only if collations fi_FI, en_GB and POSIX are available (I believe the last one is always there). Of course, adding in full multidb support would require more work so that the SQL would actually work on all databases.
comment:5 by , 11 years ago
Hmm i can see you have done loads of work in your custom lookups branch. Any idea which version those changes might end up in?
comment:6 by , 10 years ago
Now that expressions are in, a Collate expression would be a very welcome addition to Django. It should work on all databases that support collations, not just on PostgreSQL.
comment:9 by , 8 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:10 by , 5 years ago
Now that order_by
has proper expression support this is pretty straightforward to implement using an Expression
of Func
subclass.
I guess this ticket could be repurposed to allow a collation
argument to be passed to OrderBy
or introduce a backend agnostic Collate
expression.
order_by(F('title').desc(collation='fr_FR'))
or order_by(Collate('title', 'fr_FR').desc())
I think a Collate
expression would make more sense since it could be used in lookups as well filter(title=Collate(Value(title), 'fr_FR'))
.
comment:11 by , 4 years ago
I made a quick attempt at implementing Collate()
as an expression. I've hit a problem though, as it seems like SQLIte doesn't allow you to use parameters with COLLATE
:
In [34]: with connection.cursor() as cursor: ...: cursor.execute("select name from testapp_foo order by %s collate %s", ["name", "nocase"]) ...: row = cursor.fetchall() ... OperationalError: near "?": syntax error
I also have problems on postgres (haven't tried with other backends):
In [16]: with connection.cursor() as cursor: ...: cursor.execute("select name from testapp_foo order by %s collate %s", ["name", "en_US"]) ...: row = cursor.fetchone() ... ProgrammingError: syntax error at or near "'en_US'" LINE 1: select name from testapp_foo order by 'name' collate 'en_US'
Not exactly sure what's going on here, but it looks like single quotes don't do it here. It works if I remove the placeholders and set it with "en_US"
- with double quotes.
Maybe I'm misunderstanding something.
comment:12 by , 4 years ago
Cc: | added |
---|
comment:13 by , 4 years ago
Cc: | added |
---|
comment:14 by , 4 years ago
Cc: | added |
---|
Looks like collation names are identifiers on these backends and need to be escaped using connection.ops.quote_name
. Was MySQL also behaving this way?
comment:15 by , 4 years ago
Owner: | changed from | to
---|
comment:17 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Went through a few rounds of reviews and patch now LGTM
comment:18 by , 4 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
It is subject to SQL injection.
comment:19 by , 4 years ago
Yes, quote_name
cannot protect against SQL injections. It shouldn't be an issue just like Func(function)
also allows injections? As long as collation
is not under user control it should not be an issue.
Since collation names are identifiers and cannot be provided as string literals (see comment:11) I don't see bullet proof way around that. Isn't this something we could explicitly document?
comment:20 by , 4 years ago
So maybe, we can add a warning in docs, e.g.
To protect against `SQL injection attacks <https://en.wikipedia.org/wiki/SQL_injection>`_, ``collation`` shouldn't be a user-controlled parameter because collation names are database identifiers and cannot be properly escape.
comment:21 by , 4 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
I added warning to docs.
I agree that some solution to this is a good idea. There are some tickets that are close to this request, #14310 and #9682 come to mind. I didn't find a perfect duplicate so accepting this.
Custom lookups could offer a neat solution to this. You should be able to do:
qs.alias(name_fi=Collate('name', 'fi')).order_by('name_fi')
. Or, to use collated filtering, useqs.alias(name_fi=Collate('name', 'fi')).filter(name_fi__icontains='someval')
. See #16187 for details. The patch in that ticket already offer all the machinery needed to do this, but there is still a lot of work before merge is possible.While waiting for custom lookups maybe there should be some other way to achieve this.