Opened 14 years ago

Closed 14 years ago

Last modified 11 years ago

#13919 closed Uncategorized (invalid)

"Incorrect string value" warning when saving some unicode characters to MySQL

Reported by: Denilson Figueiredo de Sá Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

How to reproduce this bug, step-by-step:

  1. Create a new project.
  1. Create the database for this project in MySQL. Make sure the database has 'utf8' as its collation.
  1. Create a new app inside your project (here I'm calling it 'testapp'). Remember to add it to settings.py.
  1. At the models, create a simple one (I've named it TestModel) with just a CharField and/or a TextField.
  1. ./manage.py syncdb
  1. ./manage.py shell
  1. from testapp.models import TestModel
  1. Create a new instance of your model and set its attribute to u"Su\u1296it\U000f2a61\r\n"
  1. Call .save()

Boom! MySQLdb throws:

Warning: Incorrect string value: '\xF3\xB2\xA9\xA1\x0D\x0A' for column 'cf' at row 1

Please note that other unicode strings (like u'Bh\u0101skara\n') do work correctly, thus the MySQL database was set up correctly. However, that string above makes MySQLdb throw an exception.

Does the string above make any sense? I don't know, it was a piece of text copied-and-pasted by the user.

Versions:

  • Django 1.1.2
  • MySQL 5.0.90
  • mysql-python (MySQLdb) 1.2.3_rc1
  • Python 2.6.5

Change History (3)

comment:1 by Karen Tracey, 14 years ago

Resolution: invalid
Status: newclosed

This isn't a Django issue, it's a MySQL one. '\U000f2a61' requires 4 bytes to encode in utf-8, and MySQL only supports up to 3-byte utf-8: http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-utf8.html

comment:2 by Denilson Figueiredo de Sá, 14 years ago

In case someone else hits this bug and wants a solution or workaround, please check this StackOverflow question:

http://stackoverflow.com/questions/3220031/how-to-filter-or-replace-unicode-characters-that-would-take-more-than-3-bytes-i

comment:3 by anonymous, 11 years ago

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

Simple workaround:
After making "manage.py syncdb" change CHARACTER SET for 'title' and the 'content' columns by running the following queries:

ALTER TABLE django_flatpage CHANGE COLUMN `title` `title` VARCHAR(200) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL;
ALTER TABLE django_flatpage CHANGE COLUMN `content` `content` longtext  CHARACTER SET 'utf8' COLLATE 'utf8_general_ci' NOT NULL;
Note: See TracTickets for help on using tickets.
Back to Top