Django

Code

Ticket #3754 (closed: fixed)

Opened 2 years ago

Last modified 2 years ago

mysql charset no longer defaults to utf-8

Reported by: Florian Apolloner <florian@apolloner.eu> Assigned to: mtredinnick
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords:
Cc: farcepest@gmail.com Triage Stage: Ready for checkin
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

After updating my django installation to the newest revision I got the following problem: Everything is displayed fine except data from the database. The data is stored as utf-8 in the database, but the output is iso

I tracked the change down to revision 4724, 4723 works fine, in 4724 are some changes in mysql/*, and since this revision it does not work anymore. Any ideas?

Attachments

mysql-bug-3754.diff (0.5 kB) - added by Andy Dustman <farcepest@gmail.com> on 03/18/07 15:52:36.
Set default character set of utf8
mysql-bug-3754.2.diff (0.5 kB) - added by Andy Dustman <farcepest@gmail.com> on 03/18/07 16:23:07.
updated to use DEFAULT_CHARSET (untested)

Change History

03/18/07 09:26:22 changed by Florian Apolloner <florian@apolloner.eu>

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

Mysql-Versions: 5.0.24a Python-mysqldb: 1.2.1-p2-4ubuntu2

And assigning keywords in trac raises a internal error...

(follow-up: ↓ 3 ) 03/18/07 12:08:53 changed by derelm

this is bug #2635

(in reply to: ↑ 2 ) 03/18/07 12:52:00 changed by Michael Radziej <mir@noris.de>

Replying to derelm:

this is bug #2635

No, it's caused by the patch in #2635. Please let's keep this bug separate. mtredinnick asked not to reopen #3625.

03/18/07 12:57:44 changed by anonymous

Setting DATABASE_OPTIONS = dict(charset="utf8") helped. But I agree with http://code.djangoproject.com/ticket/2635#comment:20 Utf-8 should be made default

03/18/07 13:17:55 changed by Andy Dustman <farcepest@gmail.com>

  • cc set to farcepest@gmail.com.

03/18/07 13:53:31 changed by mtredinnick

  • summary changed from mysql charset broken to mysql charset no longer defaults to utf-8.

Changing title to differentiate between "broken" and "different".

03/18/07 14:17:40 changed by mtredinnick

  • stage changed from Unreviewed to Accepted.

03/18/07 15:52:36 changed by Andy Dustman <farcepest@gmail.com>

  • attachment mysql-bug-3754.diff added.

Set default character set of utf8

03/18/07 16:14:16 changed by Andy Dustman <farcepest@gmail.com>

  • has_patch set to 1.

Note that attachment:mysql-bug-3754.diff also sets use_unicode=False. This is also closer to the original default behavior. When you set a character set via the {{charset}}} parameter, this causes all text-like columns to be returned as unicode and not string. use_unicode=True breaks a couple Django unit tests in a superficial way the last time I checked.

03/18/07 16:18:19 changed by Michael Radziej <mir@noris.de>

  • needs_better_patch set to 1.

Instead of charset: 'utf8', it should use settings.DEFAULT_CHARSET.

03/18/07 16:23:07 changed by Andy Dustman <farcepest@gmail.com>

  • attachment mysql-bug-3754.2.diff added.

updated to use DEFAULT_CHARSET (untested)

03/18/07 16:38:45 changed by Michael Radziej <mir@noris.de>

Just for the record: Using DEFAULT_CHARSET instead of 'utf8' fixes #952; Either 'utf8' or DEFAULT_SETTINGS should fix #1356 and #3370 (but this needs testing).

The old mysql backend did not allow this solution. The problems have been discussed--ad nauseum--in this thread on django-developers.

I'm going to post a request to test this patch to django-developers.

03/18/07 16:38:54 changed by Michael Radziej <mir@noris.de>

  • needs_better_patch deleted.

03/18/07 16:43:47 changed by mtredinnick

  • owner changed from adrian to mtredinnick.

Either of these patches looks correct (thanks for the rapid response, Andy) and I have a slighty preference for the latter. I'm not completely sold on the idea of equating the database character set with DEFAULT_CHARSET in the long-term. I think we do need a DATABASE_CHARSET parameter, but that is definitely post-0.96 work. For now, restoring the previous default behaviour as the default is a nice solution for existing users, so we'll go with that. I'll apply this in a few hours, once I've had a chance to test things.

03/18/07 17:18:25 changed by Michael Radziej <mir@noris.de>

There's an issue left when you use settings.DEFAULT_CHARSET: Mysql needs to understand the character set name, right? But there are encodings that have a name in mysql that is different from their name in python (e.g., 'koi8r' vs. 'koi8-r'). Well, but it at least improves the situation.

The perfect solution would be:

  • the django mysql backend cursor decodes query strings (execute, executemany) from settings, when DEFAULT_CHARSET != 'utf8'
  • mysqldb sets the charset for connections to utf8 and uses utf8 for everything.

03/19/07 04:57:16 changed by mir@noris.de

  • stage changed from Accepted to Ready for checkin.

Ehemm. Don't use the DEFAULT_CHARSET (second) patch! (Sorry, that was a bad idea from me.)

default charset for mysql is 'utf-8'. mysql only understands 'utf8'. 'utf-8' breaks immediately:

mysql> set names 'utf-8';
ERROR 1115 (42000): Unknown character set: 'utf-8'

So, no need for further discussions, the first patch is ready for checkin ;-)

03/20/07 18:32:39 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [4760]) Fixed #3754 -- Re-introduced utf-8 as default encoding for interaction with MySQL backend (a side-effect of [4724]). Thanks Andy Dustman and Michael Radziej.


Add/Change #3754 (mysql charset no longer defaults to utf-8)




Change Properties
Action