Opened 18 years ago
Closed 18 years ago
#2810 closed defect (duplicate)
[patch] mysql encoding broken after upgrade from <4.1 to 5.0
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | contrib.admin | Version: | |
Severity: | normal | Keywords: | |
Cc: | farcepest@… | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
I had a django-database on mysql 4.0.21 and tables encoded in latin-1. After upgrading to mysql 5.0.x django things that the mysql use tables in utf-8, but the encoding hasn't changed.
I think that a configurable database-encoding would be fix this. The DEFAULT should be utf-8 since this won't break the current behavior of django.
Regards,
Dirk
Attachments (5)
Change History (12)
by , 18 years ago
Attachment: | mysql-encoding.diff added |
---|
comment:1 by , 18 years ago
Cc: | added |
---|
by , 18 years ago
Attachment: | names-utf8-browser-utf8.png added |
---|
hardcopy NAMES utf8, browser encoding utf8
by , 18 years ago
Attachment: | names-utf8-browser-iso88591.png added |
---|
SET NAMES utf8, browser encoding iso-8859-1
by , 18 years ago
Attachment: | names-latin1-browser-utf8.png added |
---|
SET NAMES latin1, browser encoding utf8
by , 18 years ago
Attachment: | names-latin1-browser-iso88591.png added |
---|
SET NAMES latin1, browser encoding iso-8859-1
comment:2 by , 18 years ago
I made some hardcopies to show the different behavior of 'SET NAMES utf8/latin1' and browser encoding 'utf-8/iso-8859-1'.
The normal encoding for django pages in the browser is 'utf-8'.
The MySQL-Tables were created at encoding latin-1/iso-8859-1
Since every output is fine in the combination 'SET NAMES latin1', browser encoding 'utf-8' I made my suggestions for the patch.
There are no errors, only wrong encoded characters.
comment:3 by , 18 years ago
Can you try my patch on #2635? I have previously been suspicious of using SET NAMES to change the character set (it really doesn't work right with the MySQLdb internals) and this may be a case that demonstrates it. The patched version uses an API call to set the character set in both directions, and I think from re-reading the docs today that SET NAMES probably only sets the character set from client to server and not the reverse direction, whereas db.set_character_set() should do both. Note that you will need MySQLdb-1.2.1 or newer (1.2.2b1) for this to work.
comment:4 by , 18 years ago
I tried your patch for django and mysql5 today. It has the same problem as it has with 'SET NAMES utf8'.
If I change two lines of your code, my problem is solved in the same way as I did it with the patch above: 'use_unicode': False, 'charset': 'latin1',
I would suggested configuring the DATABASE_ENCODING for mysql backend.
comment:5 by , 18 years ago
I'm using a legacy database (not my choice) that is MySQL 4.1. It has the encoding set to latin1 by default for the databsaes, tables, and server. Currently the svn code will not work with it as it uses SET NAMES 'utf8', which sets character_set_client, character_set_results and character_set_connection to 'utf8' [1]. Problem is that the server is using latin1, which causes collation errors, because character_set_connection = 'utf8' also sets the collation_connection to the default collation for 'utf8':
OperationalError at / (1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'")
If I change mysql/base.py to use SET CHARACTER SET 'utf8', it works, because it sets the collation_connection to the collation_database value which is correct [1]. And it still sets the character_set_client and character_set_results to 'utf8'.
[1] - http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
comment:6 by , 18 years ago
As an update for this. I've looked a bit further at this problem, and I'm not longer certain that my suggested change is appropriate. See: #2896
SET NAMES only changes the character set the client uses to talk to the server; it doesn't affect the character set of existing databases, tables, or columns, and the server transcodes into the correct character set. Since you are upgrading from 4.0 to 5.0, you may have to check your existing schema and make sure they are really using latin-1.
Are you getting an error?