Django

Code

Ticket #4741 (closed: wontfix)

Opened 2 years ago

Last modified 1 year ago

[unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch

Reported by: hidded <me@hiddedpp.com> Assigned to: nobody
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords: mysql_old, unicode
Cc: django@jensdiemer.de Triage Stage: Unreviewed
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

Subj. This testcase work proreply with sqlite and postgres, but crash with mysql/mysql_old.

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.22, for pc-linux-gnu (i486) using readline 5.1

mysql> select version();
| 5.0.41-Dotdeb_1.dotdeb.1-log |

settings.py:
    DATABASE_ENGINE = 'mysql'
    DATABASE_OPTIONS = {
        'charset': 'utf8',
    }

Make simple model, then try to save _unicoded_-string and load it from database.

>>> from catcher.models import *
>>> Feed.objects.all()
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/usr/lib/python2.4/site-packages/django/db/models/query.py", line 107, in __repr__
    return repr(self._get_data())
  File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 87, in __repr__
    return smart_str(u'<%s: %s>' % (self.__class__.__name__, unicode(self)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 10: ordinal not in range(128)
f = Feed.objects.all()[0]
f.title
'Lenta.ru: \xcd\xce\xc2\xce\xd1\xd2\xc8, 02.07.2007'
>>> f.title = u'Lenta.ru: \u041d\u041e\u0412\u041e\u0421\u0422\u0418, 02.07.2007'
>>> f.title
u'Lenta.ru: \u041d\u041e\u0412\u041e\u0421\u0422\u0418, 02.07.2007'
f.save()
f = Feed.objects.all()[0]
>>> f.title
'Lenta.ru: \xd0\x9d\xd0\x9e\xd0\x92\xd0\x9e\xd0\xa1\xd0\xa2\xd0\x98, 02.07.2007' # why

Attachments

mysql_old_base.diff (2.6 kB) - added by django@jensdiemer.de on 09/21/07 05:34:32.
decode all byte string to unicode with the server encoding.
mysql_old_base2.diff (3.2 kB) - added by django@jensdiemer.de on 09/26/07 06:07:10.
diff from the previous version: http://paste.pocoo.org/compare/4579/4100/
mysql_old_base3.diff (3.1 kB) - added by django@jensdiemer.de on 09/26/07 07:51:18.
Hardcoded encode to "utf-8"
mysql_old_base4.diff (2.2 kB) - added by jedie on 07/18/08 08:30:12.
Updated diff against revision 7947

Change History

07/02/07 20:48:57 changed by mtredinnick

  • needs_better_patch changed.
  • summary changed from Mysql (only!) returns non-unicode strings on UnicodeBranch to [unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch.
  • needs_tests changed.
  • needs_docs changed.

There is something else going on here. MySQL does return unicode strings. The tests/modeltests/basic/, for example, all pass and one of those checks that unicode strings are returned. Other tests do this also.

Do the normal Django tests all pass for you? Note that you will need to include the setting TEST_DATABASE_CHARSET='utf8' in the settings file you use for the tests to verify this, since the tests require a database that can store a wide range of characters.

Please provide an example of a simple model that does not work (the above just shows the errors, not the code involved) so that we can try to reproduce this. Make the example as simple as possible; keep removing fields until it stops failing. Ideally, a model with one field will show the problem.

(Also, you don't need the charset setting in your above example, since, if you look in the database backend, you'll see that we are already always setting the charset to utf8. We are also setting use_unicode to True, which is what causes the results to be in Unicode.)

07/05/07 10:20:32 changed by hidded <me@hiddedpp.com>

Sorry )). I'm think that this is (may be) not a bug, because I'm use database with 'utf8_bin' collate. Problem resoluted when database has been converted for use 'utf8_unicode_ci' collate.

07/06/07 00:18:08 changed by mtredinnick

The database collation order shouldn't affect things. I'm going to leave this open until I get a chance to investigate a bit more. Thanks for working out what was different about your setup.

09/20/07 14:04:44 changed by mir

Malcolm, can we close this now?

09/21/07 05:34:32 changed by django@jensdiemer.de

  • attachment mysql_old_base.diff added.

decode all byte string to unicode with the server encoding.

09/21/07 05:39:44 changed by django@jensdiemer.de

  • cc set to django@jensdiemer.de.
  • keywords set to mysql_old, unicode.
  • has_patch set to 1.
  • needs_better_patch set to 1.

I add a patch to fix the problem.

Don't know if this is the best way. But it seems to work fine. I tested it with MySQLdb v1.2.1g2 and MySQL v5.0.32

09/26/07 06:07:10 changed by django@jensdiemer.de

  • attachment mysql_old_base2.diff added.

diff from the previous version: http://paste.pocoo.org/compare/4579/4100/

09/26/07 06:10:05 changed by django@jensdiemer.de

  • version changed from other branch to SVN.

With mysql_old_base2.diff i made this:

  • decode the result in fetchmany(), too.
  • use MysqlUnicodeWrapper() in make_debug_cursor(), too.

09/26/07 07:51:18 changed by django@jensdiemer.de

  • attachment mysql_old_base3.diff added.

Hardcoded encode to "utf-8"

09/26/07 08:10:02 changed by django@jensdiemer.de

02/27/08 19:55:47 changed by jacob

  • status changed from new to closed.
  • resolution set to worksforme.

I can't reproduce, and neither can Malcolm.

07/18/08 08:30:12 changed by jedie

  • attachment mysql_old_base4.diff added.

Updated diff against revision 7947

07/18/08 08:33:15 changed by jedie

  • status changed from closed to reopened.
  • resolution deleted.

Sorry. In my environment i still need this patch. I have update the patch to the current django trunk version.

Tested with: MySQLdb 1.2.1g2 MySQL Server Version: 5.0.32-Debian_7etch6-log Python v2.4.4 (#2, Apr 15 2008, 23:43:20) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)]

07/18/08 18:36:36 changed by Simon Greenhill

  • status changed from reopened to closed.
  • resolution set to wontfix.

Well, mysql_old's been removed in [7949], so I think this goes down as a wontfix.


Add/Change #4741 ([unicode] Mysql (only!) returns non-unicode strings on UnicodeBranch)




Change Properties
Action