Opened 18 years ago

Closed 18 years ago

#3754 closed (fixed)

mysql charset no longer defaults to utf-8

Reported by: Florian Apolloner <florian@…> Owned by: Malcolm Tredinnick
Component: Database layer (models, ORM) Version: dev
Severity: Keywords:
Cc: farcepest@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

After updating my django installation to the newest revision I got the following problem:
Everything is displayed fine except data from the database.
The data is stored as utf-8 in the database, but the output is iso

I tracked the change down to revision 4724, 4723 works fine, in 4724 are some changes in mysql/*, and since this revision it does not work anymore.
Any ideas?

Attachments (2)

mysql-bug-3754.diff (529 bytes ) - added by Andy Dustman <farcepest@…> 18 years ago.
Set default character set of utf8
mysql-bug-3754.2.diff (547 bytes ) - added by Andy Dustman <farcepest@…> 18 years ago.
updated to use DEFAULT_CHARSET (untested)

Download all attachments as: .zip

Change History (17)

comment:1 by Florian Apolloner <florian@…>, 18 years ago

Mysql-Versions:
5.0.24a
Python-mysqldb:
1.2.1-p2-4ubuntu2

And assigning keywords in trac raises a internal error...

comment:2 by derelm, 18 years ago

this is bug #2635

in reply to:  2 comment:3 by Michael Radziej <mir@…>, 18 years ago

Replying to derelm:

this is bug #2635

No, it's caused by the patch in #2635. Please let's keep this bug separate. mtredinnick asked not to reopen #3625.

comment:4 by anonymous, 18 years ago

Setting DATABASE_OPTIONS = dict(charset="utf8") helped.
But I agree with http://code.djangoproject.com/ticket/2635#comment:20
Utf-8 should be made default

comment:5 by Andy Dustman <farcepest@…>, 18 years ago

Cc: farcepest@… added

comment:6 by Malcolm Tredinnick, 18 years ago

Summary: mysql charset brokenmysql charset no longer defaults to utf-8

Changing title to differentiate between "broken" and "different".

comment:7 by Malcolm Tredinnick, 18 years ago

Triage Stage: UnreviewedAccepted

by Andy Dustman <farcepest@…>, 18 years ago

Attachment: mysql-bug-3754.diff added

Set default character set of utf8

comment:8 by Andy Dustman <farcepest@…>, 18 years ago

Has patch: set

Note that attachment:mysql-bug-3754.diff also sets use_unicode=False. This is also closer to the original default behavior. When you set a character set via the {{charset}}} parameter, this causes all text-like columns to be returned as unicode and not string. use_unicode=True breaks a couple Django unit tests in a superficial way the last time I checked.

comment:9 by Michael Radziej <mir@…>, 18 years ago

Patch needs improvement: set

Instead of charset: 'utf8', it should use settings.DEFAULT_CHARSET.

by Andy Dustman <farcepest@…>, 18 years ago

Attachment: mysql-bug-3754.2.diff added

updated to use DEFAULT_CHARSET (untested)

comment:10 by Michael Radziej <mir@…>, 18 years ago

Just for the record: Using DEFAULT_CHARSET instead of 'utf8' fixes #952; Either 'utf8' or DEFAULT_SETTINGS should fix #1356 and #3370 (but this needs testing).

The old mysql backend did not allow this solution. The problems have been discussed--ad nauseum--in this thread on django-developers.

I'm going to post a request to test this patch to django-developers.

comment:11 by Michael Radziej <mir@…>, 18 years ago

Patch needs improvement: unset

comment:12 by Malcolm Tredinnick, 18 years ago

Owner: changed from Adrian Holovaty to Malcolm Tredinnick

Either of these patches looks correct (thanks for the rapid response, Andy) and I have a slighty preference for the latter. I'm not completely sold on the idea of equating the database character set with DEFAULT_CHARSET in the long-term. I think we do need a DATABASE_CHARSET parameter, but that is definitely post-0.96 work. For now, restoring the previous default behaviour as the default is a nice solution for existing users, so we'll go with that. I'll apply this in a few hours, once I've had a chance to test things.

comment:13 by Michael Radziej <mir@…>, 18 years ago

There's an issue left when you use settings.DEFAULT_CHARSET: Mysql needs to understand the character set name, right? But there are encodings that have a name in mysql that is different from their name in python (e.g., 'koi8r' vs. 'koi8-r'). Well, but it at least improves the situation.

The perfect solution would be:

  • the django mysql backend cursor decodes query strings (execute, executemany) from settings, when DEFAULT_CHARSET != 'utf8'
  • mysqldb sets the charset for connections to utf8 and uses utf8 for everything.

comment:14 by mir@…, 18 years ago

Triage Stage: AcceptedReady for checkin

Ehemm. Don't use the DEFAULT_CHARSET (second) patch! (Sorry, that was a bad idea from me.)

default charset for mysql is 'utf-8'. mysql only understands 'utf8'. 'utf-8' breaks immediately:

mysql> set names 'utf-8';
ERROR 1115 (42000): Unknown character set: 'utf-8'

So, no need for further discussions, the first patch is ready for checkin ;-)

comment:15 by Malcolm Tredinnick, 18 years ago

Resolution: fixed
Status: newclosed

(In [4760]) Fixed #3754 -- Re-introduced utf-8 as default encoding for interaction with
MySQL backend (a side-effect of [4724]). Thanks Andy Dustman and Michael
Radziej.

Note: See TracTickets for help on using tickets.
Back to Top