Opened 11 years ago

Closed 11 years ago

#4021 closed (fixed)

[unicode][patch] initial sql with non-ascii strings not imported

Reported by: Ivan Sagalaev <Maniac@…> Owned by: Jacob
Component: Uncategorized Version: other branch
Severity: Keywords: unicode
Cc: Malcolm Tredinnick, Maniac@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:


An SQL file containing utf-8 encoded data looking like this:

set names utf8;
insert into cicero_forum (`slug`, `name`, `group`, `ordering`) values ('test', 'Тестовый форум', 'Тест', 0);

breaks initial syncdb with mysql backend:

'ascii' codec can't decode byte 0xd0 in position 81: ordinal not in range(128)

Evidently there's a plain str-to-unicode conversion somewhere...

Attachments (1)

4021.diff (3.8 KB) - added by Ivan Sagalaev <Maniac@…> 11 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 Changed 11 years ago by Ivan Sagalaev <Maniac@…>

Found it...

The problem occurs in in syncdb where it passes raw file contents as str into cursor.execute(). Now since we have {'use_unicode': True} for mysql backend it apparently expects only unicode data.

The obvious fix would be decoding content of custom .sql files in syncdb. Here we have the same problem as with templates: we can't know for sure in which encoding the file is. Another way to do it is to connect to MySQL during syncdb with {'use_unicode': False} and without explicit charset. I think this is correct since syncdb is a command line tool and shouldn't care about unicode internals.


Changed 11 years ago by Ivan Sagalaev <Maniac@…>

Attachment: 4021.diff added


comment:2 Changed 11 years ago by Ivan Sagalaev <Maniac@…>

Summary: [unicode] initial sql with non-ascii strings not imported[unicode][patch] initial sql with non-ascii strings not imported

In the end I've decided not to load templates with because it appears that always opens files in binary mode while currently we use a simple open that does it in text. May be this is not an issue though...

comment:3 Changed 11 years ago by Malcolm Tredinnick

Resolution: fixed
Status: newclosed

(In [5058]) unicode: Added FILE_CHARSET setting and use it to decode files read from disk.
Based on a patch from Ivan Sagalaev. Fixed #4021.

Note: See TracTickets for help on using tickets.
Back to Top