Opened 15 months ago

Closed 15 months ago

Last modified 15 months ago

#22458 closed Cleanup/optimization (fixed)

MySQL notes recommend legacy utf8_general_ci unicode collation

Reported by: tobami@… Owned by: mardini
Component: Documentation Version: 1.7-beta-1
Severity: Normal Keywords: unicode
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

The documentation section "MySQL notes" recommends the obsolete utf8_general_ci collation settings:
"By default, with a UTF-8 database, MySQL will use the utf8_general_ci collation." [0]
and
"... you should still use utf8_general_ci (the default) collation for the django.contrib.sessions.models.Session table"

While it may still be the default depending on your MySQL version, MySQL itself recommends utf8_unicode_ci instead of utf8_general_ci, as the later can be incorrect for some characters and languages and its performance benefits are no longer relevant. From the MySQL docs themselves:
"utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters." [1]

Using utf8_general_ci can be the cause of difficult to debug text issues.
IMO Django should update its MySQL collation recommendation to utf8_unicode_ci.

[0] https://docs.djangoproject.com/en/dev/ref/databases/#collation-settings
[1] http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-sets.html

Change History (6)

comment:1 Changed 15 months ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 15 months ago by mardini

  • Owner changed from nobody to mardini
  • Status changed from new to assigned

comment:3 Changed 15 months ago by mardini

PR: https://github.com/django/django/pull/2587

MySQL documentation doesn't recommends utf8_unicode_ci in all cases. It states that "comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci", and "If this is acceptable for your application, you should use utf8_general_ci because it is faster. If this is not acceptable (for example, if you require German dictionary order), use utf8_unicode_ci because it is more accurate." I added a note and a link that explains both cases, and what the recommended usage for each collation is. Thanks.

comment:4 Changed 15 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 11ac50b18e578498c1d95e0a75921b5864387d46:

Fixed #22458 -- Added a note about MySQL utf8_unicode_ci collation

Thanks tobami at gmail.com for the report.

comment:5 Changed 15 months ago by Tim Graham <timograham@…>

In b6863879e1cf20acdecb3606da8fe66b486836cf:

[1.6.x] Fixed #22458 -- Added a note about MySQL utf8_unicode_ci collation

Thanks tobami at gmail.com for the report.

Backport of 11ac50b18e from master

comment:6 Changed 15 months ago by Tim Graham <timograham@…>

In b1e7dd445bb64c27df8e2b6902a76a67c79332ab:

[1.7.x] Fixed #22458 -- Added a note about MySQL utf8_unicode_ci collation

Thanks tobami at gmail.com for the report.

Backport of 11ac50b18e from master

Note: See TracTickets for help on using tickets.
Back to Top