Code

Opened 7 years ago

Closed 6 years ago

#5657 closed (fixed)

[patch] urlize breaks when string.letters is changed by the locale

Reported by: Andrew Stoneman <astoneman@…> Owned by: nobody
Component: Uncategorized Version: master
Severity: Keywords: sprintdec01
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

utils.html.urlize() depends on string.letters being automatically convertible into unicode. If it is changed by the locale module, however, it sometimes contains non-ascii characters, which causes very mysterious unicode decode errors on sites that use the function. I've included a patch to have it use string.ascii_letters instead, which does not change.

Attachments (2)

urlize_ascii_letters.patch (856 bytes) - added by Andrew Stoneman <astoneman@…> 7 years ago.
patch to use ascii_letters instead of letters
urlize_with_ascii_plus_unittests.patch (3.1 KB) - added by shaleh 6 years ago.
updated patch, adds unittests

Download all attachments as: .zip

Change History (7)

Changed 7 years ago by Andrew Stoneman <astoneman@…>

patch to use ascii_letters instead of letters

comment:1 Changed 7 years ago by Andrew Stoneman <astoneman@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

A sample session to show the problem:

>>> import locale
>>> import string
>>> from django.utils.html import urlize
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> locale.setlocale(locale.LC_ALL, 'de_DE')
'de_DE'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4
\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> urlize('abc')
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django/utils/functional.py",
line 129, in wrapper
    return func(*args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django/utils/html.py",
line 82, in urlize
    if middle.startswith('www.') or ('@' not in middle and not middle.startswith('http://') and \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 52: ordinal not in range(128)

comment:2 Changed 6 years ago by Simon G <dev@…>

  • Needs tests set
  • Triage Stage changed from Unreviewed to Ready for checkin

Andrew - this looks good, but can we get a regression test (like if your example above)?

Changed 6 years ago by shaleh

updated patch, adds unittests

comment:3 Changed 6 years ago by shaleh

  • Keywords sprintdec01 added
  • Needs tests unset

unittests added

comment:4 Changed 6 years ago by mtredinnick

Unfortunately, the test case isn't sufficiently portable (for example, on my Ubuntu laptop, it fails with an "invalid locale" error). Rather than worrying too much about lots of different installation situations, I'm just going to commit the core patch. It's correct and I can live without tests for this small change.

comment:5 Changed 6 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [6856]) Fixed #5657 -- Use string.ascii_letters instead of ascii.letters in the urlize
filter to ensure consistent (and correct) results no matter what the server's
locale setting might be. Thanks, Andrew Stoneman.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.