Code

Opened 7 years ago

Closed 5 years ago

Last modified 5 years ago

#3924 closed (fixed)

Caught an exception while rendering: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)

Reported by: ruben.perez@… Owned by: hugo
Component: contrib.admin Version: 1.0
Severity: Keywords:
Cc: jm.bugtracking@…, antoni.aloy@…, mpjung@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description (last modified by mtredinnick)

Hello, after update the django code from svn, my admin site doesn´t work. The error is : Caught an exception while rendering: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128).

My models.py has a correct coding declaration, it is # -*- coding: UTF-8 -*-. My admin site allways works with previous svn revisions but it doesn´t work with the latest revision 4926 from the developer code. I´m working with language-code "es-es".

The problem is related with the use of non-ascii characters as the return value of __str__ with models.ForeignKey and models.ManyToManyField.

Thank you very much.

Below you can find a piece of code.

UnicodeDecodeError at /admin/articulos/articulo/add/
'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
Request Method: 	GET
Request URL: 	http://www..../admin/articulos/articulo/add/
Exception Type: 	UnicodeDecodeError
Exception Value: 	'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
Exception Location: 	/home/rubenper/django/django/oldforms/__init__.py in render, line 490
Template error

In template /home/rubenper/django/django/contrib/admin/templates/widget/foreign.html, error at line 2
Caught an exception while rendering: 'ascii' codec can't decode byte 0xc3 in position 8: ordinal not in range(128)
1 	{% load admin_modify adminmedia %}
2 	{% output_all bound_field.form_fields %}

Attachments (0)

Change History (30)

comment:1 Changed 7 years ago by anonymous

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

comment:2 Changed 7 years ago by anonymous

  • Resolution invalid deleted
  • Status changed from closed to reopened

Hi, I reopen this bug because the problem persists.

The problem is related with this:

http://code.djangoproject.com/changeset?new=django%404919&old=django%404918

Version 4918 works
Version 4919 doesn´t work

I often use words like "Asociación" for example, and with version 4919 or newer, non-ascii characters don´t work.

html = str(html) change to html = smart_unicode(html)

Thank You

I´m sorry but I´m a newbie and I don´t know how to resolve it.

comment:3 Changed 7 years ago by mtredinnick

  • Description modified (diff)

Fixed formatting in description.

comment:4 Changed 7 years ago by mtredinnick

I suspect this is partly because [4919] has exposed a bug in smart_unicode(). We should be treating all str instances as UTF-8 encoded, as per PEP 263 (settings.DEFAULT_CHARSET doesn't play a role there). Before I can fix that, I need to fix form input handling so that the only bytestrings we have floating around in newforms are UTF-8 strings. That is work in progress. Then I'll make it work with oldforms.

comment:5 Changed 7 years ago by mtredinnick

What is your DEFAULT_CHARSET setting in you Django settings?

If the string returned from your models' __str__ method encoded in the DEFAULT_CHARSET encoding? If not, you can't expect it to work. I suspect you are using UTF-8 internally everywhere, but can you please confirm that.

comment:6 follow-up: Changed 7 years ago by karsu

I confirm that [4919] breaks admin interface if you are using non-ascii characters.

DEFAULT_CHARSET = utf-8

I made test case for that problem.

form_tests = r"""
>>> from django.oldforms import *

>>> CHOICES = ((0, u'\x84\x94'),)
>>> f = SelectField(field_name='test', choices=CHOICES)
>>> print f

>>> f = SelectField(field_name=u'\x84\x94', choices=None)
>>> print f
"""

comment:7 Changed 7 years ago by ruben.perez@…

Hi, I can confirm that my settings.DEFAULT_CHARSET is utf-8

Thank You very much for your help.

comment:8 in reply to: ↑ 6 Changed 7 years ago by mtredinnick

Replying to karsu:

I confirm that [4919] breaks admin interface if you are using non-ascii characters.

DEFAULT_CHARSET = utf-8

I made test case for that problem.

This example doesn't work prior to [4919] (I tested with [4918]). This ticket is only about the change reported in the summary, so any related test case should work in [4918].

Thanks for making a test case, but it isn't related to this ticket.

comment:9 Changed 7 years ago by mir@…

  • Component changed from Internationalization to Template system
  • Triage Stage changed from Unreviewed to Accepted

I can also confirm that [4919] is broken. The problem is:

  • when I use oldforms, the context is all in utf-8 encoded bytestrings
  • with [4919], escape() now returns a unicode string
  • template.render joins the various strings, and during this python tries to decode the bytestrings to unicode strings. This happens with the python defaultencoding, which is 'ASCII'. If any bytestring contains non-ASCII characters, you get the exception mentioned above.

I'm not familiar with the admin interface, but I guess it still uses oldforms and also escape() somewhere.

Reverting [4919] solved the issues for me.

comment:10 Changed 7 years ago by mir@…

Sorry about the format. Here it is again in a readable format.

I can also confirm that [4919] is broken. The problem is:

  • when I use oldforms, the context is all in utf-8 encoded bytestrings
  • with [4919], escape() now returns a unicode string
  • template.render joins the various strings, and during this python tries to decode the bytestrings to unicode strings. This happens with the python defaultencoding, which is 'ASCII'. If any bytestring contains non-ASCII characters, you get the exception mentioned above.

I'm not familiar with the admin interface, but I guess it still uses oldforms and also escape() somewhere.

Reverting [4919] solved the issues for me.

comment:11 Changed 7 years ago by mtredinnick

(In [4933]) Backed out a portion of [4919] until I can make it worth smoothly with
oldforms. Refs #3924.

comment:12 Changed 7 years ago by anonymous

  • Cc jm.bugtracking@… added

comment:13 Changed 7 years ago by olga@…

Hello,
I also have such error. I use newforms and I see that SelectDateWidget() raises UnicodeError. Before, I solved this problem fixing in django\newforms\util.py. I've removed 'smart_unicode_lazy'.

def smart_unicode(s):
    if isinstance(s, Promise):
        # The input is something from gettext_lazy or similar. We don't want to
        # translate it until render time, so defer the conversion.

        # return smart_unicode_lazy(s)
        return smart_unicode_immediate(s) 
    else:
        return smart_unicode_immediate(s) 

I've upgraded Django till 4939 rev. And I see the problem still exists. So I've just tried to make it like it was before:
django\utils\encoding.py

#def smart_unicode(s):
#    if isinstance(s, Promise):
        # The input is the result of a gettext_lazy() call, or similar. It will
        # already be encoded in DEFAULT_CHARSET on evaluation and we don't want
        # to evaluate it until render time.
        # FIXME: This isn't totally consistent, because it eventually returns a
        # bytestring rather than a unicode object. It works wherever we use
        # smart_unicode() at the moment. Fixing this requires work in the
        # i18n internals.
#        return s
#    if not isinstance(s, basestring,):
#        if hasattr(s, '__unicode__'):
#            s = unicode(s)
#        else:
#            s = unicode(str(s), settings.DEFAULT_CHARSET)
#    elif not isinstance(s, unicode):
#        s = unicode(s, settings.DEFAULT_CHARSET)
#    return s

def smart_unicode(s):
    if isinstance(s, Promise):
        # The input is something from gettext_lazy or similar. We don't want to
        # translate it until render time, so defer the conversion.
        return smart_unicode_immediate(s)
    else:
        return smart_unicode_immediate(s)

def smart_unicode_immediate(s):
    if not isinstance(s, basestring):
        if hasattr(s, '__unicode__'):
            s = unicode(s)
        else:
            s = unicode(str(s), settings.DEFAULT_CHARSET)
    elif not isinstance(s, unicode):
        s = unicode(s, settings.DEFAULT_CHARSET)
    return s

Everything works fine. I don't know what is wrong. Looking forward for some advice. Thanks.

comment:14 follow-up: Changed 7 years ago by mtredinnick

The solution in the previous comment will fail for Forms that are created at import time and then the locale() changes before they are used. It forces the translation to occur at the time smart_unicode() is called, rather than at the moment of output (when the locale will be set correcty). Which means almost 100% failures for many internationalised applications.

I am working on fixing the FIXME in that code at the moment.

comment:15 in reply to: ↑ 14 ; follow-up: Changed 7 years ago by olga@…

Replying to mtredinnick:

The solution in the previous comment will fail for Forms that are created at import time and then the locale() changes before they are used. It forces the translation to occur at the time smart_unicode() is called, rather than at the moment of output (when the locale will be set correcty). Which means almost 100% failures for many internationalised applications.

I am working on fixing the FIXME in that code at the moment.

Thanks for your reply. Now I use only German, but in the future, maybe in a month, it will be multilanguage application. Is there some right solutions on this moment or we will get working code soon? Thanks again!

comment:16 follow-up: Changed 7 years ago by mtredinnick

Regarding comment 13, you didn't actually say what you did to cause the error, just what you did to stop it. If you have a short example, could you post it, please. I'd like to see somebody else's repeatable test cases to ensure I am not missing anything.

comment:17 Changed 7 years ago by mtredinnick

#3947 has an example of the same problem. Add something like activate('es') before rendering the form to see the problem.

comment:18 Changed 7 years ago by antono.aloy@…

Hello!

I can reproduce the problem also within the admin interfave when trying to add a record to a table having a ForeignKey relation to another table with utf-8 characters in it.

My language code is 'es-es' and default charset to utf-8

comment:19 Changed 7 years ago by antoni.aloy@…

  • Cc antoni.aloy@… added

I can confirm there is an unicode related error on rendering the ForeignKey drop down list in the administration application, as I can reproduce it each time, just add some 'à,é, etc' to one of the field returned by the str method of the foreign key.

Adding a "unicode(yourstring, errors="ignore") to the str output gave me a quick&dirty solution, but it removes all the ofending characters.

comment:20 in reply to: ↑ 16 Changed 7 years ago by anonymous

Replying to mtredinnick:

Regarding comment 13, you didn't actually say what you did to cause the error, just what you did to stop it. If you have a short example, could you post it, please. I'd like to see somebody else's repeatable test cases to ensure I am not missing anything.

Hello!
Here is an example:
tmp.py:

from django import newforms as forms 
from django.shortcuts import render_to_response
from django.newforms.extras import SelectDateWidget
import datetime

class TestForm(forms.Form):
    birthdate = forms.DateField(
               widget = SelectDateWidget(years=xrange(datetime.date.today().year,1900,-1))
       )

def unicode_test(request):
    form = TestForm()
    return render_to_response('test.html', {'form': form})

test.html:

<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<title>Test</title>
	</head>
	<body>
		{{ form }}
	</body>
</html>

Also I have such setting:

LANGUAGE_CODE = 'de'
gettext = lambda s: s
LANGUAGES = (
    ('de', gettext('Greman')),
)

MIDDLEWARE_CLASSES = (    
    'django.contrib.csrf.middleware.CsrfMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.locale.LocaleMiddleware',    
    'django.middleware.common.CommonMiddleware',    
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
)

TEMPLATE_CONTEXT_PROCESSORS = (
    "django.core.context_processors.auth",
    "django.core.context_processors.debug",
    "django.core.context_processors.i18n",
)

DATABASE_OPTIONS = {'charset': 'utf8'} 
DEFAULT_CHARSET = 'utf-8'

comment:21 Changed 7 years ago by olga@…

Sorry, previous example is mine. Also I know that the problem is "März" month. When I change _('March') on 'March' everything works fine.

comment:22 in reply to: ↑ 15 Changed 7 years ago by Ciantic

Replying to mtredinnick:
The solution in the previous comment will fail for Forms that are created at import time and then the locale() changes before they are used. It forces the translation to occur at the time smart_unicode() is called, rather than at the moment of output (when the locale will be set correcty). Which means almost 100% failures for many internationalised applications.

I am working on fixing the FIXME in that code at the moment.

I debugged one of my problems and final conclusion was same as in FIXME... One question though, what bug will you mark as fixed when it is ready? Since I could really need patch for this. I have now ugly hacks of using decode here and there, where ever gettext_lazy and smart_unicode is used, which mostly is in BaseForm btw.

comment:23 Changed 7 years ago by Michael P. Jung

  • Cc mpjung@… added

comment:24 Changed 7 years ago by alessandro.ronchi@…

There is another similar problem in _html_output. I've corrected commenting out the FIXME!! in the file django_src/django/newforms/forms.py


def _html_output(self, normal_row, error_row, row_ender, help_text_html, errors_on_separate_row):
        "Helper function for outputting HTML. Used by as_table(), as_ul(), as_p()."
        top_errors = self.non_field_errors() # Errors that should be displayed above all fields.
        output, hidden_fields = [], []
        for name, field in self.fields.items():
            bf = BoundField(self, field, name)
            bf_errors = ErrorList([escape(error) for error in bf.errors]) # Escape and cache in local variable.
            if bf.is_hidden:
                if bf_errors:
                    top_errors.extend(['(Hidden field %s) %s' % (name, e) for e in bf_errors])
                hidden_fields.append(unicode(bf))
            else:
                if errors_on_separate_row and bf_errors:
                    output.append(error_row % bf_errors)
                if bf.label:
                    label = escape(bf.label)
                    # Only add a colon if the label does not end in punctuation.
                    if label[-1] not in ':?.!':
                        label += ':'
                    label = bf.label_tag(label) or ''
                else:
                    label = ''
                if field.help_text:
                    #help_text = help_text_html % field.help_text FIXME!!
                    help_text = u''
                else:
                    help_text = u''
                output.append(normal_row % {'errors': bf_errors, 'label': label, 'field': unicode(bf), 'help_text': help_text})

comment:25 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

The original problem reported in this bug was fixed by [4933], so I'm going to close this.

The broader issues that some comments raise are being addressed on the unicode branch and aren't related to the original problem report in any case (so should be separate tickets if they aren't fixed after the unicode branch merges).

comment:26 follow-up: Changed 5 years ago by gnudiff

  • Component changed from Template system to django.contrib.admin
  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Version changed from SVN to 1.0

I can confirm the same error as of today, Django 1.0.2 final.

The error seems to occur when admin interface has to return string representation of a foreign key, such as:

class Visitor(models.Model):
    firstName = models.CharField(max_length=60,help_text="Vārds")
    lastName = models.CharField(max_length=60,help_text="Uzvārds")

    def __unicode__(self):
        return "%s %s" % (self.lastName,self.firstName)
####
class Visit(models.Model):
    visitor = models.ForeignKey(Visitor, help_text="Pacients")
    visitDate = models.DateTimeField(help_text="Pieteiktais laiks")

    def __unicode__(self):
        return "%s @ %s" % (self.visitor,self.visitDate.isoformat())

In this case, if you have Visitor with nonascii chars in name, you will be able to see him in admin interface fine, but as soon as you try to add a Visit for him and then try to access the list of all Visits, you will get an error of the following type.

Request URL:  	http://127.0.0.1:8000/admin/callregister/visit/add/
Exception Type: 	DjangoUnicodeDecodeError
Exception Value: 	

'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128). You passed in <Visit: [Bad Unicode data]> (<class 'prs.callregister.models.Visit'>)

Exception Location: 	C:\Python25\lib\site-packages\django\utils\encoding.py in force_unicode, line 70
Python Executable: 	C:\Python25\python.exe
Python Version: 	2.5.0

comment:27 in reply to: ↑ 26 Changed 5 years ago by ramiro

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to gnudiff:

I can confirm the same error as of today, Django 1.0.2 final.

Please don't reopen ticket that were fixed agains a very earlier version and are two-years old. If you, after asking in support channels like the mailing list, are sure it is a Django ticket and not a problem with your code, then open a new ticket.

>     def __unicode__(self):
>         return "%s %s" % (self.lastName,self.firstName)

You are supposed to return Unicode data from the __unicode__ method. Try with return u"%s %s" % (self.lastName,self.firstName).

Restoring ticket status.

comment:28 Changed 5 years ago by kmtracey

It's actually Visit's __unicode__ method that is the problem in this specific case. It needs to be:

   def __unicode__(self):
        return u"%s @ %s" % (self.visitor,self.visitDate.isoformat())

(Not that fixing Visitor's as well isn't a good idea, it just won't fix the exception mentioned.)

comment:29 Changed 5 years ago by gnudiff

Yep, sorry, you are right!

What threw me off track was that it was not consistent (ie. in unicode chars in model's unicode func thing worked, it was only when one unicode was calling other model's unicode where the problem started).

comment:30 Changed 5 years ago by kmtracey

Yes, Python's behavior here can cause some confusion. Note "%s" % x may evaluate to either Unicode or a bytestring depending on the type of x:

>>> u = u"Unicode!"
>>> b = 'Bytestring'
>>> "%s" % u
u'Unicode!'
>>> "%s" % b
'Bytestring'

This is why you often won't see a problem with your Visitor __unicode__ as originally coded: usually the lastName and firstName attributes will be Unicode values, so "%s %s" % (self.lastName,self.firstName) will evaluate to a Unicode value and you won't see a problem.

For more complicated situations, say a class that supports both __str__ and __unicode__, whether or not you specify u'' on the interpolation will control what type of result you get:

>>> class Thing(object):
...     def __init__(self, x):
...         self.x = x
...     def __str__(self):
...         return self.x.encode('utf-8')
...     def __unicode__(self):
...         return self.x
... 
>>> t = Thing(u'\u0101')
>>> "%s" % t
'\xc4\x81'
>>> u"%s" % t
u'\u0101'

This is where your Visit __unicode__ method as originally coded has a problem. self.visitor has both __str__ and __unicode__ methods, so unless you force the __unicode__ one to be called by using u'' for the interpolation, the result will be a bytestring, and if that contains non-ascii chars then the automatic coercion to unicode using the ascii codec will generate an exception.

>>> unicode("%s" % t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)
>>> 

So yes, sometimes it seems like you don't necessarily absolutely need the u'' for the __unicode__ return value. But figuring out when exactly you need it and when you don't can be complicated so it's a good idea to use it as a matter of routine.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.