Opened 18 years ago
Closed 18 years ago
#4021 closed (fixed)
[unicode][patch] initial sql with non-ascii strings not imported
Reported by: | Owned by: | Jacob | |
---|---|---|---|
Component: | Uncategorized | Version: | other branch |
Severity: | Keywords: | unicode | |
Cc: | Malcolm Tredinnick, Maniac@… | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
An SQL file containing utf-8 encoded data looking like this:
set names utf8; insert into cicero_forum (`slug`, `name`, `group`, `ordering`) values ('test', 'Тестовый форум', 'Тест', 0);
breaks initial syncdb with mysql backend:
'ascii' codec can't decode byte 0xd0 in position 81: ordinal not in range(128)
Evidently there's a plain str-to-unicode conversion somewhere...
Attachments (1)
Change History (4)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
Summary: | [unicode] initial sql with non-ascii strings not imported → [unicode][patch] initial sql with non-ascii strings not imported |
---|
In the end I've decided not to load templates with codecs.open because it appears that codecs.open always opens files in binary mode while currently we use a simple open that does it in text. May be this is not an issue though...
comment:3 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Note:
See TracTickets
for help on using tickets.
Found it...
The problem occurs in management.py in syncdb where it passes raw file contents as str into cursor.execute(). Now since we have {'use_unicode': True} for mysql backend it apparently expects only unicode data.
The obvious fix would be decoding content of custom .sql files in syncdb. Here we have the same problem as with templates: we can't know for sure in which encoding the file is. Another way to do it is to connect to MySQL during syncdb with {'use_unicode': False} and without explicit charset. I think this is correct since syncdb is a command line tool and shouldn't care about unicode internals.
Thoughts?