Opened 8 years ago

Closed 8 years ago

#4021 closed (fixed)

[unicode][patch] initial sql with non-ascii strings not imported

Reported by: Ivan Sagalaev <Maniac@…> Owned by: jacob
Component: Uncategorized Version: other branch
Severity: Keywords: unicode
Cc: mtredinnick, Maniac@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:


An SQL file containing utf-8 encoded data looking like this:

set names utf8;
insert into cicero_forum (`slug`, `name`, `group`, `ordering`) values ('test', 'Тестовый форум', 'Тест', 0);

breaks initial syncdb with mysql backend:

'ascii' codec can't decode byte 0xd0 in position 81: ordinal not in range(128)

Evidently there's a plain str-to-unicode conversion somewhere...

Attachments (1)

4021.diff (3.8 KB) - added by Ivan Sagalaev <Maniac@…> 8 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 Changed 8 years ago by Ivan Sagalaev <Maniac@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Found it...

The problem occurs in in syncdb where it passes raw file contents as str into cursor.execute(). Now since we have {'use_unicode': True} for mysql backend it apparently expects only unicode data.

The obvious fix would be decoding content of custom .sql files in syncdb. Here we have the same problem as with templates: we can't know for sure in which encoding the file is. Another way to do it is to connect to MySQL during syncdb with {'use_unicode': False} and without explicit charset. I think this is correct since syncdb is a command line tool and shouldn't care about unicode internals.


Changed 8 years ago by Ivan Sagalaev <Maniac@…>


comment:2 Changed 8 years ago by Ivan Sagalaev <Maniac@…>

  • Summary changed from [unicode] initial sql with non-ascii strings not imported to [unicode][patch] initial sql with non-ascii strings not imported

In the end I've decided not to load templates with because it appears that always opens files in binary mode while currently we use a simple open that does it in text. May be this is not an issue though...

comment:3 Changed 8 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [5058]) unicode: Added FILE_CHARSET setting and use it to decode files read from disk.
Based on a patch from Ivan Sagalaev. Fixed #4021.

Note: See TracTickets for help on using tickets.
Back to Top