Opened 5 years ago

Closed 5 years ago

Last modified 3 years ago

#13919 closed Uncategorized (invalid)

"Incorrect string value" warning when saving some unicode characters to MySQL

Reported by: denilsonsa Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

How to reproduce this bug, step-by-step:

  1. Create a new project.
  1. Create the database for this project in MySQL. Make sure the database has 'utf8' as its collation.
  1. Create a new app inside your project (here I'm calling it 'testapp'). Remember to add it to settings.py.
  1. At the models, create a simple one (I've named it TestModel) with just a CharField and/or a TextField.
  1. ./manage.py syncdb
  1. ./manage.py shell
  1. from testapp.models import TestModel
  1. Create a new instance of your model and set its attribute to u"Su\u1296it\U000f2a61\r\n"
  1. Call .save()

Boom! MySQLdb throws:

Warning: Incorrect string value: '\xF3\xB2\xA9\xA1\x0D\x0A' for column 'cf' at row 1

Please note that other unicode strings (like u'Bh\u0101skara\n') do work correctly, thus the MySQL database was set up correctly. However, that string above makes MySQLdb throw an exception.

Does the string above make any sense? I don't know, it was a piece of text copied-and-pasted by the user.

Versions:

  • Django 1.1.2
  • MySQL 5.0.90
  • mysql-python (MySQLdb) 1.2.3_rc1
  • Python 2.6.5

Change History (3)

comment:1 Changed 5 years ago by kmtracey

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

This isn't a Django issue, it's a MySQL one. '\U000f2a61' requires 4 bytes to encode in utf-8, and MySQL only supports up to 3-byte utf-8: http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-utf8.html

comment:2 Changed 5 years ago by denilsonsa

In case someone else hits this bug and wants a solution or workaround, please check this StackOverflow question:

http://stackoverflow.com/questions/3220031/how-to-filter-or-replace-unicode-characters-that-would-take-more-than-3-bytes-i

comment:3 Changed 3 years ago by anonymous

  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

Simple workaround:
After making "manage.py syncdb" change CHARACTER SET for 'title' and the 'content' columns by running the following queries:

ALTER TABLE django_flatpage CHANGE COLUMN `title` `title` VARCHAR(200) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL;
ALTER TABLE django_flatpage CHANGE COLUMN `content` `content` longtext  CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL;
Note: See TracTickets for help on using tickets.
Back to Top