Code

Opened 8 years ago

Closed 7 years ago

#2810 closed defect (duplicate)

[patch] mysql encoding broken after upgrade from <4.1 to 5.0

Reported by: dummy@… Owned by: adrian
Component: contrib.admin Version:
Severity: normal Keywords:
Cc: farcepest@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Hi,

I had a django-database on mysql 4.0.21 and tables encoded in latin-1. After upgrading to mysql 5.0.x django things that the mysql use tables in utf-8, but the encoding hasn't changed.

I think that a configurable database-encoding would be fix this. The DEFAULT should be utf-8 since this won't break the current behavior of django.

Regards,
Dirk

Attachments (5)

mysql-encoding.diff (1.4 KB) - added by dummy@… 8 years ago.
names-utf8-browser-utf8.png (37.4 KB) - added by dummy@… 8 years ago.
hardcopy NAMES utf8, browser encoding utf8
names-utf8-browser-iso88591.png (39.4 KB) - added by dummy@… 8 years ago.
SET NAMES utf8, browser encoding iso-8859-1
names-latin1-browser-utf8.png (35.1 KB) - added by dummy@… 8 years ago.
SET NAMES latin1, browser encoding utf8
names-latin1-browser-iso88591.png (37.5 KB) - added by dummy@… 8 years ago.
SET NAMES latin1, browser encoding iso-8859-1

Download all attachments as: .zip

Change History (12)

Changed 8 years ago by dummy@…

comment:1 Changed 8 years ago by Andy Dustman <farcepest@…>

  • Cc farcepest@… added

SET NAMES only changes the character set the client uses to talk to the server; it doesn't affect the character set of existing databases, tables, or columns, and the server transcodes into the correct character set. Since you are upgrading from 4.0 to 5.0, you may have to check your existing schema and make sure they are really using latin-1.

Are you getting an error?

Changed 8 years ago by dummy@…

hardcopy NAMES utf8, browser encoding utf8

Changed 8 years ago by dummy@…

SET NAMES utf8, browser encoding iso-8859-1

Changed 8 years ago by dummy@…

SET NAMES latin1, browser encoding utf8

Changed 8 years ago by dummy@…

SET NAMES latin1, browser encoding iso-8859-1

comment:2 Changed 8 years ago by dummy@…

I made some hardcopies to show the different behavior of 'SET NAMES utf8/latin1' and browser encoding 'utf-8/iso-8859-1'.

The normal encoding for django pages in the browser is 'utf-8'.
The MySQL-Tables were created at encoding latin-1/iso-8859-1

Since every output is fine in the combination 'SET NAMES latin1', browser encoding 'utf-8' I made my suggestions for the patch.

There are no errors, only wrong encoded characters.

comment:3 Changed 8 years ago by Andy Dustman <farcepest@…>

Can you try my patch on #2635? I have previously been suspicious of using SET NAMES to change the character set (it really doesn't work right with the MySQLdb internals) and this may be a case that demonstrates it. The patched version uses an API call to set the character set in both directions, and I think from re-reading the docs today that SET NAMES probably only sets the character set from client to server and not the reverse direction, whereas db.set_character_set() should do both. Note that you will need MySQLdb-1.2.1 or newer (1.2.2b1) for this to work.

comment:4 Changed 8 years ago by dummy@…

I tried your patch for django and mysql5 today. It has the same problem as it has with 'SET NAMES utf8'.

If I change two lines of your code, my problem is solved in the same way as I did it with the patch above: 'use_unicode': False, 'charset': 'latin1',

I would suggested configuring the DATABASE_ENCODING for mysql backend.

comment:5 Changed 8 years ago by lakin@…

I'm using a legacy database (not my choice) that is MySQL 4.1. It has the encoding set to latin1 by default for the databsaes, tables, and server. Currently the svn code will not work with it as it uses SET NAMES 'utf8', which sets character_set_client, character_set_results and character_set_connection to 'utf8' [1]. Problem is that the server is using latin1, which causes collation errors, because character_set_connection = 'utf8' also sets the collation_connection to the default collation for 'utf8':

OperationalError at /
(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'")

If I change mysql/base.py to use SET CHARACTER SET 'utf8', it works, because it sets the collation_connection to the collation_database value which is correct [1]. And it still sets the character_set_client and character_set_results to 'utf8'.


[1] - http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html

comment:6 Changed 8 years ago by lakin@…

As an update for this. I've looked a bit further at this problem, and I'm not longer certain that my suggested change is appropriate. See: #2896

comment:7 Changed 7 years ago by adrian

  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #952.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.