#3370 closed (fixed)
[patch] newforms: form.save() raises UnicodeEncodeError when form contains any non latin characters
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | Forms | Version: | dev |
Severity: | Keywords: | newforms utf8 unicode-branch | |
Cc: | Maniac@…, jm.bugtracking@…, densetsu.no.ero.sennin@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Hello everyone
I've found the following bug with newforms: when one uses form_for_model/form_for_instance methods and then does save() and form contains any non-latin (national) characters, UnicodeEncodeError is raised.
Example (console encoding = ru_RU.UTF-8)
>>> from myproject.models import Payment >>> from django import newforms as forms
Let's try form_for_model:
>>> PaymentForm1 = forms.models.form_for_model(Payment) >>> form1 = PaymentForm1({'description': 'превед now', 'event_date': '2007-01-26', 'user': '1', 'pay_type': 'cash', 'amount':'50.3'}) >>> form1.is_valid() True >>> form1.save() Traceback (most recent call last): File "<console>", line 1, in ? File "/usr/lib/python2.4/site-packages/django/newforms/models.py", line 25, in model_save obj.save() File "/usr/lib/python2.4/site-packages/netangels/models/payment.py", line 26, in save super(Payment, self).save() File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 204, in save ','.join(placeholders)), db_values) File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 148, in execute query = query % db.literal(args) File "/usr/lib/python2.4/site-packages/MySQLdb/connections.py", line 232, in literal return self.escape(o, self.encoders) File "/usr/lib/python2.4/site-packages/MySQLdb/connections.py", line 179, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-5: ordinal not in range(256)
now let's try form_for_instance
>>> payment = Payment.objects.get(pk=2) >>> PaymentForm2 = forms.models.form_for_instance(payment) >>> form2 = PaymentForm2({'description': 'превед now', 'event_date': '2007-01-26', 'user': '1', 'pay_type': 'cash', 'amount':'50.3'}) >>> form2.is_valid() True >>> form2.save() Traceback (most recent call last): File "<console>", line 1, in ? File "/usr/lib/python2.4/site-packages/django/newforms/models.py", line 52, in save return save_instance(self, instance, commit) File "/usr/lib/python2.4/site-packages/django/newforms/models.py", line 46, in save_instance instance.save() File "/usr/lib/python2.4/site-packages/netangels/models/payment.py", line 26, in save super(Payment, self).save() File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 184, in save db_values + [pk_val]) File "/usr/lib/python2.4/site-packages/MySQLdb/cursors.py", line 148, in execute query = query % db.literal(args) File "/usr/lib/python2.4/site-packages/MySQLdb/connections.py", line 232, in literal return self.escape(o, self.encoders) File "/usr/lib/python2.4/site-packages/MySQLdb/connections.py", line 179, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-5: ordinal not in range(256)
and one more little example:
>>> form2.clean_data['description'] = 'превед in unicode' >>> form2.save()
it works because description is of type 'str', but contains Unicode characters.
Attachments (7)
Change History (29)
by , 18 years ago
Attachment: | mysql-utf.patch added |
---|
comment:1 by , 18 years ago
I've added the patch that fixes the UnicodeEncodeError raising but after save() I get:
Traceback (most recent call last): File "<console>", line 1, in ? File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 80, in __repr__ return '<%s: %s>' % (self.__class__.__name__, self) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)
this is because self
contains unicode string with russian characters
comment:2 by , 18 years ago
Summary: | newforms: form.save() raises UnicodeEncodeError when form contains any non latin characters → [patch] newforms: form.save() raises UnicodeEncodeError when form contains any non latin characters |
---|
comment:3 by , 18 years ago
Has patch: | set |
---|---|
Needs tests: | set |
comment:4 by , 18 years ago
comment:5 by , 18 years ago
Michael, #1356 - yes this does the same thing as a part of my patches.
Only thing is missed in both patches is that we should not change charset if we use MySQL 4.0 and I think it needs to be checked.
About #3314 - it is good patch but it does not fix the following problem:
>>> from myproject.models import * >>> t = Tariff.objects.get(pk=1) >>> t Traceback (most recent call last): File "<console>", line 1, in ? File "/usr/lib/python2.4/site-packages/django/db/models/base.py", line 80, in __repr__ return '<%s: %s>' % (self.__class__.__name__, self) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)
where str(t) looks like this:
def __str__(self): return self.name
and name contains unicode non-latin characters
I'm going to upload a better patch that fixes my problems and that should work ok with mysql 4.0
by , 18 years ago
Attachment: | mysql-utf8-complete.2.patch added |
---|
Better patch that fix the issue and should work correctly with MySQL 4.0
comment:6 by , 18 years ago
Sorry guys but
self.connection.charset = 'utf8'
does NOT fix the problem for me, only passing 'charset': 'utf8' to kwargs does ....
So I'm sending the last patch (I hope) that works for me but it is not tested against MySQL 4.0
comment:7 by , 18 years ago
Keywords: | newforms utf8 added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:8 by , 18 years ago
Cc: | added |
---|
- This patch won't work for legacy databases configured to non-utf-8 encodings (and they aren't rare at all). You can't just hard-code 'utf8' all over the place.
- In fact this 'bug' is not a bug at all. Newforms now use unicode data looking forward to ongoing unicodification. Until it happens people are expected to convert unicode to needed charset explicitly. I don't think it should be 'fixed' in a hurry, only for one backend and by hard-coding a workaround for some special case.
- Ticket #952 is an actual fix for this very issue, it proposes a separate setting for setting database charset in order to Django could encode unicode data itself.
To keep things sane I suggest marking it as a dupe of #952.
comment:9 by , 18 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Accepted → Design decision needed |
We have a bit of chaos here ... Tickets #3370, #1356 and probably #952 all are about this problem, all are accepted, and #3370 and #1356 have very similar patches. I ask everybody to continue discussion in django-developers ("unicode issues in multiple tickets"), and I ask the authors of these three tickets to work together to find out how to proceed.
As long as it's not clear which path to take, I mark all bugs as "design decision needed." (I assume that the other reviews were not aware of the competing tickets.)
http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf
comment:10 by , 18 years ago
Ivan, I think you are wrong.
Firstly, take a look at django/db/backends/mysql/base.py, there is:
if self.connection.get_server_info() >= '4.1': cursor.execute("SET NAMES 'utf8'")
you can see that utf8 is already hardcoded there.
#952 could be good for legacy charset support BUT if you take a look into newforms code you find there that all data is converted to unicode before to into db so I think newforms won't be compatible with non-unicode databases at all.
The reason why I started to do this patch is simple: I have a big project coded in windows-1251 which uses MySQL 4.1 with cp1251 encoding and since I started to migrate parts to newforms I found that either newforms use unicode in clean_data and all apps based on newforms must be coded utf8 (and I can't change this, decision was made without asking people like me), or these new apps won't work for any national characters in db.
#952 will just break newforms. I am saying this because I have had django with similar to #952 patch applied - newforms don't work with it.
comment:11 by , 18 years ago
Anton, could you please post this to the new thread? The discussions need to be merged to get anywhere.
comment:13 by , 18 years ago
Currently, newforms uses unicode and Django's database layer doesn't, so we should convert the data at the boundary (presumably converting it using the default charset is the best approach). See attached patch.
comment:14 by , 18 years ago
Esaj, it is better to add patches that move database layer to unicode too, than trying to do what you do. Please read our discussion in django-devel.
Btw I have a complete patch for common db layer and mysql backend that moves everything to unicode. Dunno if it is required.
comment:15 by , 18 years ago
Anton, moving db backends to unicode is certainly better but it's an incomparable amount of work. Along with documenting, testing on different versions of db libraries it may take months. This patch just fixes things as they are now and it doesn't hurt.
comment:16 by , 18 years ago
Ivan, this work needs to be started anyway, isn't it ?
You can think I have it a half done.
comment:17 by , 18 years ago
Hi,
I came across the same error and found a good solution where newforms is doing the save call.
Regards,
Dirk
comment:18 by , 18 years ago
Cc: | added |
---|
comment:19 by , 18 years ago
Cc: | added |
---|
comment:20 by , 18 years ago
Keywords: | unicode-branch added |
---|---|
Triage Stage: | Design decision needed → Accepted |
This has been fixed on the unicode branch. I'll close the ticket when that branch is merged with trunk.
comment:21 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
patch that fixes the issue (not well tested)