Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#14721 closed (fixed)

USE_THOUSAND_SEPARATOR fails with UnicodeDecodeError in several locales

Reported by: Marti Raudsepp Owned by: Jannis Leidel
Component: Internationalization Version: 1.2
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no


When I'm trying to use the {{ foo|floatformat }} template tag with USE_THOUSAND_SEPARATOR=True and numbers >= 1000, it fails with UnicodeDecodeError, resulting in an empty value in the rendered template. This is a pretty plain Ubuntu 10.10 machine, tested with an empty Django project on Ubuntu-packaged Django version 1.2.3 as well as SVN trunk.

Apparently this happens because THOUSANDS_SEPARATOR in is a normal str object -- not Unicode -- but contains an UTF-8 no-break space sequence.

It seems that the same no-break string is also used in bg, fi, hu, lv and uk locales!

% django-admin startproject foo
% cd foo
% ./ shell
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) 
Type "copyright", "credits" or "license" for more information.

IPython 0.10 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: from django.utils import translation

In [2]: from django.template.defaultfilters import floatformat

In [3]: from django.conf import settings

In [4]: settings.NUMBER_GROUPING=3

In [5]: settings.USE_THOUSAND_SEPARATOR=True

In [6]: translation.get_language()
Out[6]: 'en-us'

In [7]: floatformat(1000)
Out[7]: u'1,000'

In [8]: translation.activate('et')

In [9]: translation.get_language()
Out[9]: 'et'

In [10]: floatformat(1000)
UnicodeDecodeError                        Traceback (most recent call last)

/tmp/foo/<ipython console> in <module>()

/usr/lib/pymodules/python2.6/django/template/defaultfilters.pyc in floatformat(text, arg)
    165     if not m and p < 0:
--> 166         return mark_safe(formats.number_format(u'%d' % (int(d)), 0))
    168     if p == 0:

/usr/lib/pymodules/python2.6/django/utils/formats.pyc in number_format(value, decimal_pos)
     72         decimal_pos,
     73         get_format('NUMBER_GROUPING'),
---> 74         get_format('THOUSAND_SEPARATOR'),
     75     )

/usr/lib/pymodules/python2.6/django/utils/numberformat.pyc in format(number, decimal_sep, decimal_pos, grouping, thousand_sep)
     35         for cnt, digit in enumerate(int_part[::-1]):
     36             if cnt and not cnt % grouping:
---> 37                 int_part_gd += thousand_sep
     38             int_part_gd += digit
     39         int_part = int_part_gd[::-1]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)

In [11]: from django.utils import formats

In [12]: formats.get_format('THOUSAND_SEPARATOR')
Out[12]: '\xc2\xa0'

In [13]: formats.get_format_modules()
Out[13]: [<module '' from '/usr/lib/pymodules/python2.6/django/conf/locale/et/formats.pyc'>]

Attachments (2)

mark_unicode_strings.diff (2.3 KB) - added by Claude Paroz 8 years ago.
Add 'u' before unicode strings
mark_unicode_strings2.diff (3.0 KB) - added by Claude Paroz 8 years ago.
Also add comments (and en/

Download all attachments as: .zip

Change History (10)

comment:1 Changed 8 years ago by Claude Paroz

Has patch: set
milestone: 1.3
Triage Stage: UnreviewedAccepted

Unicode strings in Python files should be prefixed with 'u'.

Changed 8 years ago by Claude Paroz

Attachment: mark_unicode_strings.diff added

Add 'u' before unicode strings

comment:2 Changed 8 years ago by Alex Gaynor

Needs tests: set

comment:3 Changed 8 years ago by Claude Paroz

Owner: changed from nobody to Jannis Leidel

We could even add the u before the strings in the en/ to serve as good model.

comment:4 Changed 8 years ago by Marti Raudsepp

I'd suggest using something like u'\u00a0', that way it's clear that it's not just a regular space, but a Unicode character.

comment:5 Changed 8 years ago by Claude Paroz

I don't like Unicode sequences exposed to users. I'm adding a second version of the patch with comments instead.

Changed 8 years ago by Claude Paroz

Attachment: mark_unicode_strings2.diff added

Also add comments (and en/

comment:6 Changed 8 years ago by Jannis Leidel

Resolution: fixed
Status: newclosed

(In [14708]) Fixed #14721 -- Made the THOUSAND_SEPERATOR a unicode string in a few locales. Thanks, Claude Paroz.

comment:7 Changed 8 years ago by Jannis Leidel

(In [14712]) [1.2.X] Fixed #14721 -- Made the THOUSAND_SEPERATOR a unicode string in a few locales. Thanks, Claude Paroz.

Backport from trunk (r14708).

comment:8 Changed 7 years ago by Jacob

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top