Opened 18 years ago

Closed 18 years ago

#4021 closed (fixed)

[unicode][patch] initial sql with non-ascii strings not imported

Reported by: Ivan Sagalaev <Maniac@…> Owned by: Jacob
Component: Uncategorized Version: other branch
Severity: Keywords: unicode
Cc: Malcolm Tredinnick, Maniac@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

An SQL file containing utf-8 encoded data looking like this:

set names utf8;
insert into cicero_forum (`slug`, `name`, `group`, `ordering`) values ('test', 'Тестовый форум', 'Тест', 0);

breaks initial syncdb with mysql backend:

'ascii' codec can't decode byte 0xd0 in position 81: ordinal not in range(128)

Evidently there's a plain str-to-unicode conversion somewhere...

Attachments (1)

4021.diff (3.8 KB ) - added by Ivan Sagalaev <Maniac@…> 18 years ago.
Patch

Download all attachments as: .zip

Change History (4)

comment:1 by Ivan Sagalaev <Maniac@…>, 18 years ago

Found it...

The problem occurs in management.py in syncdb where it passes raw file contents as str into cursor.execute(). Now since we have {'use_unicode': True} for mysql backend it apparently expects only unicode data.

The obvious fix would be decoding content of custom .sql files in syncdb. Here we have the same problem as with templates: we can't know for sure in which encoding the file is. Another way to do it is to connect to MySQL during syncdb with {'use_unicode': False} and without explicit charset. I think this is correct since syncdb is a command line tool and shouldn't care about unicode internals.

Thoughts?

by Ivan Sagalaev <Maniac@…>, 18 years ago

Attachment: 4021.diff added

Patch

comment:2 by Ivan Sagalaev <Maniac@…>, 18 years ago

Summary: [unicode] initial sql with non-ascii strings not imported[unicode][patch] initial sql with non-ascii strings not imported

In the end I've decided not to load templates with codecs.open because it appears that codecs.open always opens files in binary mode while currently we use a simple open that does it in text. May be this is not an issue though...

comment:3 by Malcolm Tredinnick, 18 years ago

Resolution: fixed
Status: newclosed

(In [5058]) unicode: Added FILE_CHARSET setting and use it to decode files read from disk.
Based on a patch from Ivan Sagalaev. Fixed #4021.

Note: See TracTickets for help on using tickets.
Back to Top