Code

Opened 7 years ago

Closed 7 years ago

#3754 closed (fixed)

mysql charset no longer defaults to utf-8

Reported by: Florian Apolloner <florian@…> Owned by: mtredinnick
Component: Database layer (models, ORM) Version: master
Severity: Keywords:
Cc: farcepest@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

After updating my django installation to the newest revision I got the following problem:
Everything is displayed fine except data from the database.
The data is stored as utf-8 in the database, but the output is iso

I tracked the change down to revision 4724, 4723 works fine, in 4724 are some changes in mysql/*, and since this revision it does not work anymore.
Any ideas?

Attachments (2)

mysql-bug-3754.diff (529 bytes) - added by Andy Dustman <farcepest@…> 7 years ago.
Set default character set of utf8
mysql-bug-3754.2.diff (547 bytes) - added by Andy Dustman <farcepest@…> 7 years ago.
updated to use DEFAULT_CHARSET (untested)

Download all attachments as: .zip

Change History (17)

comment:1 Changed 7 years ago by Florian Apolloner <florian@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Mysql-Versions:
5.0.24a
Python-mysqldb:
1.2.1-p2-4ubuntu2

And assigning keywords in trac raises a internal error...

comment:2 follow-up: Changed 7 years ago by derelm

this is bug #2635

comment:3 in reply to: ↑ 2 Changed 7 years ago by Michael Radziej <mir@…>

Replying to derelm:

this is bug #2635

No, it's caused by the patch in #2635. Please let's keep this bug separate. mtredinnick asked not to reopen #3625.

comment:4 Changed 7 years ago by anonymous

Setting DATABASE_OPTIONS = dict(charset="utf8") helped.
But I agree with http://code.djangoproject.com/ticket/2635#comment:20
Utf-8 should be made default

comment:5 Changed 7 years ago by Andy Dustman <farcepest@…>

  • Cc farcepest@… added

comment:6 Changed 7 years ago by mtredinnick

  • Summary changed from mysql charset broken to mysql charset no longer defaults to utf-8

Changing title to differentiate between "broken" and "different".

comment:7 Changed 7 years ago by mtredinnick

  • Triage Stage changed from Unreviewed to Accepted

Changed 7 years ago by Andy Dustman <farcepest@…>

Set default character set of utf8

comment:8 Changed 7 years ago by Andy Dustman <farcepest@…>

  • Has patch set

Note that attachment:mysql-bug-3754.diff also sets use_unicode=False. This is also closer to the original default behavior. When you set a character set via the {{charset}}} parameter, this causes all text-like columns to be returned as unicode and not string. use_unicode=True breaks a couple Django unit tests in a superficial way the last time I checked.

comment:9 Changed 7 years ago by Michael Radziej <mir@…>

  • Patch needs improvement set

Instead of charset: 'utf8', it should use settings.DEFAULT_CHARSET.

Changed 7 years ago by Andy Dustman <farcepest@…>

updated to use DEFAULT_CHARSET (untested)

comment:10 Changed 7 years ago by Michael Radziej <mir@…>

Just for the record: Using DEFAULT_CHARSET instead of 'utf8' fixes #952; Either 'utf8' or DEFAULT_SETTINGS should fix #1356 and #3370 (but this needs testing).

The old mysql backend did not allow this solution. The problems have been discussed--ad nauseum--in this thread on django-developers.

I'm going to post a request to test this patch to django-developers.

comment:11 Changed 7 years ago by Michael Radziej <mir@…>

  • Patch needs improvement unset

comment:12 Changed 7 years ago by mtredinnick

  • Owner changed from adrian to mtredinnick

Either of these patches looks correct (thanks for the rapid response, Andy) and I have a slighty preference for the latter. I'm not completely sold on the idea of equating the database character set with DEFAULT_CHARSET in the long-term. I think we do need a DATABASE_CHARSET parameter, but that is definitely post-0.96 work. For now, restoring the previous default behaviour as the default is a nice solution for existing users, so we'll go with that. I'll apply this in a few hours, once I've had a chance to test things.

comment:13 Changed 7 years ago by Michael Radziej <mir@…>

There's an issue left when you use settings.DEFAULT_CHARSET: Mysql needs to understand the character set name, right? But there are encodings that have a name in mysql that is different from their name in python (e.g., 'koi8r' vs. 'koi8-r'). Well, but it at least improves the situation.

The perfect solution would be:

  • the django mysql backend cursor decodes query strings (execute, executemany) from settings, when DEFAULT_CHARSET != 'utf8'
  • mysqldb sets the charset for connections to utf8 and uses utf8 for everything.

comment:14 Changed 7 years ago by mir@…

  • Triage Stage changed from Accepted to Ready for checkin

Ehemm. Don't use the DEFAULT_CHARSET (second) patch! (Sorry, that was a bad idea from me.)

default charset for mysql is 'utf-8'. mysql only understands 'utf8'. 'utf-8' breaks immediately:

mysql> set names 'utf-8';
ERROR 1115 (42000): Unknown character set: 'utf-8'

So, no need for further discussions, the first patch is ready for checkin ;-)

comment:15 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [4760]) Fixed #3754 -- Re-introduced utf-8 as default encoding for interaction with
MySQL backend (a side-effect of [4724]). Thanks Andy Dustman and Michael
Radziej.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.