Opened 17 years ago

Closed 16 years ago

#5657 closed (fixed)

[patch] urlize breaks when string.letters is changed by the locale

Reported by: Andrew Stoneman <astoneman@…> Owned by: nobody
Component: Uncategorized Version: dev
Severity: Keywords: sprintdec01
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

utils.html.urlize() depends on string.letters being automatically convertible into unicode. If it is changed by the locale module, however, it sometimes contains non-ascii characters, which causes very mysterious unicode decode errors on sites that use the function. I've included a patch to have it use string.ascii_letters instead, which does not change.

Attachments (2)

urlize_ascii_letters.patch (856 bytes ) - added by Andrew Stoneman <astoneman@…> 17 years ago.
patch to use ascii_letters instead of letters
urlize_with_ascii_plus_unittests.patch (3.1 KB ) - added by shaleh 16 years ago.
updated patch, adds unittests

Download all attachments as: .zip

Change History (7)

by Andrew Stoneman <astoneman@…>, 17 years ago

Attachment: urlize_ascii_letters.patch added

patch to use ascii_letters instead of letters

comment:1 by Andrew Stoneman <astoneman@…>, 17 years ago

A sample session to show the problem:

>>> import locale
>>> import string
>>> from django.utils.html import urlize
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> locale.setlocale(locale.LC_ALL, 'de_DE')
'de_DE'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4
\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> urlize('abc')
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django/utils/functional.py",
line 129, in wrapper
    return func(*args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django/utils/html.py",
line 82, in urlize
    if middle.startswith('www.') or ('@' not in middle and not middle.startswith('http://') and \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 52: ordinal not in range(128)

comment:2 by Simon G <dev@…>, 16 years ago

Needs tests: set
Triage Stage: UnreviewedReady for checkin

Andrew - this looks good, but can we get a regression test (like if your example above)?

by shaleh, 16 years ago

updated patch, adds unittests

comment:3 by shaleh, 16 years ago

Keywords: sprintdec01 added
Needs tests: unset

unittests added

comment:4 by Malcolm Tredinnick, 16 years ago

Unfortunately, the test case isn't sufficiently portable (for example, on my Ubuntu laptop, it fails with an "invalid locale" error). Rather than worrying too much about lots of different installation situations, I'm just going to commit the core patch. It's correct and I can live without tests for this small change.

comment:5 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: newclosed

(In [6856]) Fixed #5657 -- Use string.ascii_letters instead of ascii.letters in the urlize
filter to ensure consistent (and correct) results no matter what the server's
locale setting might be. Thanks, Andrew Stoneman.

Note: See TracTickets for help on using tickets.
Back to Top