#22458 closed Cleanup/optimization (fixed)
MySQL notes recommend legacy utf8_general_ci unicode collation
Reported by: | Owned by: | mardini | |
---|---|---|---|
Component: | Documentation | Version: | 1.7-beta-1 |
Severity: | Normal | Keywords: | unicode |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
The documentation section "MySQL notes" recommends the obsolete utf8_general_ci collation settings:
"By default, with a UTF-8 database, MySQL will use the utf8_general_ci collation." [0]
and
"... you should still use utf8_general_ci (the default) collation for the django.contrib.sessions.models.Session table"
While it may still be the default depending on your MySQL version, MySQL itself recommends utf8_unicode_ci instead of utf8_general_ci, as the later can be incorrect for some characters and languages and its performance benefits are no longer relevant. From the MySQL docs themselves:
"utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters." [1]
Using utf8_general_ci can be the cause of difficult to debug text issues.
IMO Django should update its MySQL collation recommendation to utf8_unicode_ci.
[0] https://docs.djangoproject.com/en/dev/ref/databases/#collation-settings
[1] http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html
Change History (6)
comment:1 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 11 years ago
comment:4 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
PR: https://github.com/django/django/pull/2587
MySQL documentation doesn't recommends utf8_unicode_ci in all cases. It states that "comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci", and "If this is acceptable for your application, you should use utf8_general_ci because it is faster. If this is not acceptable (for example, if you require German dictionary order), use utf8_unicode_ci because it is more accurate." I added a note and a link that explains both cases, and what the recommended usage for each collation is. Thanks.