Opened 17 years ago

Closed 16 years ago

#4741 closed (wontfix)

[unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch

Reported by: hidded <me@…> Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Keywords: mysql_old, unicode
Cc: django@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Subj. This testcase work proreply with sqlite and postgres, but crash with mysql/mysql_old.

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (i486) using readline 5.1

mysql> select version();
| 5.0.41-Dotdeb_1.dotdeb.1-log |

settings.py:
    DATABASE_ENGINE = 'mysql'
    DATABASE_OPTIONS = {
        'charset': 'utf8',
    }

Make simple model, then try to save _unicoded_-string and load it from database.

>>> from catcher.models import *
>>> Feed.objects.all()
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/django/db/models/query.py", line 107, in __repr__
    return repr(self._get_data())
  File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 87, in __repr__
    return smart_str(u'<%s: %s>' % (self.__class__.__name__, unicode(self)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 10: ordinal not in range(128)
f = Feed.objects.all()[0]
f.title
'Lenta.ru: \xcd\xce\xc2\xce\xd1\xd2\xc8, 02.07.2007'
>>> f.title = u'Lenta.ru: \u041d\u041e\u0412\u041e\u0421\u0422\u0418, 02.07.2007'
>>> f.title
u'Lenta.ru: \u041d\u041e\u0412\u041e\u0421\u0422\u0418, 02.07.2007'
f.save()
f = Feed.objects.all()[0]
>>> f.title
'Lenta.ru: \xd0\x9d\xd0\x9e\xd0\x92\xd0\x9e\xd0\xa1\xd0\xa2\xd0\x98, 02.07.2007' # why

Attachments (4)

mysql_old_base.diff (2.6 KB ) - added by django@… 17 years ago.
decode all byte string to unicode with the server encoding.
mysql_old_base2.diff (3.2 KB ) - added by django@… 17 years ago.
diff from the previous version: http://paste.pocoo.org/compare/4579/4100/
mysql_old_base3.diff (3.1 KB ) - added by django@… 17 years ago.
Hardcoded encode to "utf-8"
mysql_old_base4.diff (2.2 KB ) - added by jedie 16 years ago.
Updated diff against revision 7947

Download all attachments as: .zip

Change History (14)

comment:1 by Malcolm Tredinnick, 17 years ago

Summary: Mysql (only!) returns non-unicode strings on UnicodeBranch[unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch

There is something else going on here. MySQL does return unicode strings. The tests/modeltests/basic/, for example, all pass and one of those checks that unicode strings are returned. Other tests do this also.

Do the normal Django tests all pass for you? Note that you will need to include the setting TEST_DATABASE_CHARSET='utf8' in the settings file you use for the tests to verify this, since the tests require a database that can store a wide range of characters.

Please provide an example of a simple model that does not work (the above just shows the errors, not the code involved) so that we can try to reproduce this. Make the example as simple as possible; keep removing fields until it stops failing. Ideally, a model with one field will show the problem.

(Also, you don't need the charset setting in your above example, since, if you look in the database backend, you'll see that we are already always setting the charset to utf8. We are also setting use_unicode to True, which is what causes the results to be in Unicode.)

comment:2 by hidded <me@…>, 17 years ago

Sorry )). I'm think that this is (may be) not a bug, because I'm use database with 'utf8_bin' collate. Problem resoluted when database has been converted for use 'utf8_unicode_ci' collate.

comment:3 by Malcolm Tredinnick, 17 years ago

The database collation order shouldn't affect things. I'm going to leave this open until I get a chance to investigate a bit more. Thanks for working out what was different about your setup.

comment:4 by Michael Radziej, 17 years ago

Malcolm, can we close this now?

by django@…, 17 years ago

Attachment: mysql_old_base.diff added

decode all byte string to unicode with the server encoding.

comment:5 by django@…, 17 years ago

Cc: django@… added
Has patch: set
Keywords: mysql_old unicode added
Patch needs improvement: set

I add a patch to fix the problem.

Don't know if this is the best way. But it seems to work fine. I tested it with MySQLdb v1.2.1g2 and MySQL v5.0.32

by django@…, 17 years ago

Attachment: mysql_old_base2.diff added

diff from the previous version: http://paste.pocoo.org/compare/4579/4100/

comment:6 by django@…, 17 years ago

Version: other branchSVN

With mysql_old_base2.diff i made this:

  • decode the result in fetchmany(), too.
  • use MysqlUnicodeWrapper() in make_debug_cursor(), too.

by django@…, 17 years ago

Attachment: mysql_old_base3.diff added

Hardcoded encode to "utf-8"

comment:8 by Jacob, 17 years ago

Resolution: worksforme
Status: newclosed

I can't reproduce, and neither can Malcolm.

by jedie, 16 years ago

Attachment: mysql_old_base4.diff added

Updated diff against revision 7947

comment:9 by jedie, 16 years ago

Resolution: worksforme
Status: closedreopened

Sorry. In my environment i still need this patch.
I have update the patch to the current django trunk version.

Tested with:
MySQLdb 1.2.1g2
MySQL Server Version: 5.0.32-Debian_7etch6-log
Python v2.4.4 (#2, Apr 15 2008, 23:43:20)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)]

comment:10 by Simon Greenhill, 16 years ago

Resolution: wontfix
Status: reopenedclosed

Well, mysql_old's been removed in [7949], so I think this goes down as a wontfix.

Note: See TracTickets for help on using tickets.
Back to Top