Code

Opened 7 years ago

Closed 6 years ago

#4741 closed (wontfix)

[unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch

Reported by: hidded <me@…> Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: mysql_old, unicode
Cc: django@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

Subj. This testcase work proreply with sqlite and postgres, but crash with mysql/mysql_old.

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (i486) using readline 5.1

mysql> select version();
| 5.0.41-Dotdeb_1.dotdeb.1-log |

settings.py:
    DATABASE_ENGINE = 'mysql'
    DATABASE_OPTIONS = {
        'charset': 'utf8',
    }

Make simple model, then try to save _unicoded_-string and load it from database.

>>> from catcher.models import *
>>> Feed.objects.all()
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/django/db/models/query.py", line 107, in __repr__
    return repr(self._get_data())
  File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 87, in __repr__
    return smart_str(u'<%s: %s>' % (self.__class__.__name__, unicode(self)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 10: ordinal not in range(128)
f = Feed.objects.all()[0]
f.title
'Lenta.ru: \xcd\xce\xc2\xce\xd1\xd2\xc8, 02.07.2007'
>>> f.title = u'Lenta.ru: \u041d\u041e\u0412\u041e\u0421\u0422\u0418, 02.07.2007'
>>> f.title
u'Lenta.ru: \u041d\u041e\u0412\u041e\u0421\u0422\u0418, 02.07.2007'
f.save()
f = Feed.objects.all()[0]
>>> f.title
'Lenta.ru: \xd0\x9d\xd0\x9e\xd0\x92\xd0\x9e\xd0\xa1\xd0\xa2\xd0\x98, 02.07.2007' # why

Attachments (4)

mysql_old_base.diff (2.6 KB) - added by django@… 7 years ago.
decode all byte string to unicode with the server encoding.
mysql_old_base2.diff (3.2 KB) - added by django@… 7 years ago.
diff from the previous version: http://paste.pocoo.org/compare/4579/4100/
mysql_old_base3.diff (3.1 KB) - added by django@… 7 years ago.
Hardcoded encode to "utf-8"
mysql_old_base4.diff (2.2 KB) - added by jedie 6 years ago.
Updated diff against revision 7947

Download all attachments as: .zip

Change History (14)

comment:1 Changed 7 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from Mysql (only!) returns non-unicode strings on UnicodeBranch to [unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch

There is something else going on here. MySQL does return unicode strings. The tests/modeltests/basic/, for example, all pass and one of those checks that unicode strings are returned. Other tests do this also.

Do the normal Django tests all pass for you? Note that you will need to include the setting TEST_DATABASE_CHARSET='utf8' in the settings file you use for the tests to verify this, since the tests require a database that can store a wide range of characters.

Please provide an example of a simple model that does not work (the above just shows the errors, not the code involved) so that we can try to reproduce this. Make the example as simple as possible; keep removing fields until it stops failing. Ideally, a model with one field will show the problem.

(Also, you don't need the charset setting in your above example, since, if you look in the database backend, you'll see that we are already always setting the charset to utf8. We are also setting use_unicode to True, which is what causes the results to be in Unicode.)

comment:2 Changed 7 years ago by hidded <me@…>

Sorry )). I'm think that this is (may be) not a bug, because I'm use database with 'utf8_bin' collate. Problem resoluted when database has been converted for use 'utf8_unicode_ci' collate.

comment:3 Changed 7 years ago by mtredinnick

The database collation order shouldn't affect things. I'm going to leave this open until I get a chance to investigate a bit more. Thanks for working out what was different about your setup.

comment:4 Changed 7 years ago by mir

Malcolm, can we close this now?

Changed 7 years ago by django@…

decode all byte string to unicode with the server encoding.

comment:5 Changed 7 years ago by django@…

  • Cc django@… added
  • Has patch set
  • Keywords mysql_old, unicode added
  • Patch needs improvement set

I add a patch to fix the problem.

Don't know if this is the best way. But it seems to work fine. I tested it with MySQLdb v1.2.1g2 and MySQL v5.0.32

Changed 7 years ago by django@…

diff from the previous version: http://paste.pocoo.org/compare/4579/4100/

comment:6 Changed 7 years ago by django@…

  • Version changed from other branch to SVN

With mysql_old_base2.diff i made this:

  • decode the result in fetchmany(), too.
  • use MysqlUnicodeWrapper() in make_debug_cursor(), too.

Changed 7 years ago by django@…

Hardcoded encode to "utf-8"

comment:8 Changed 6 years ago by jacob

  • Resolution set to worksforme
  • Status changed from new to closed

I can't reproduce, and neither can Malcolm.

Changed 6 years ago by jedie

Updated diff against revision 7947

comment:9 Changed 6 years ago by jedie

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Sorry. In my environment i still need this patch.
I have update the patch to the current django trunk version.

Tested with:
MySQLdb 1.2.1g2
MySQL Server Version: 5.0.32-Debian_7etch6-log
Python v2.4.4 (#2, Apr 15 2008, 23:43:20)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)]

comment:10 Changed 6 years ago by Simon Greenhill

  • Resolution set to wontfix
  • Status changed from reopened to closed

Well, mysql_old's been removed in [7949], so I think this goes down as a wontfix.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.