Django

Code

Ticket #5657 (closed: fixed)

Opened 8 months ago

Last modified 6 months ago

[patch] urlize breaks when string.letters is changed by the locale

Reported by: Andrew Stoneman <astoneman@gmail.com> Assigned to: nobody
Component: Uncategorized Version: SVN
Keywords: sprintdec01 Cc:
Triage Stage: Ready for checkin Has patch: 1
Needs documentation: 0 Needs tests: 0
Patch needs improvement: 0

Description

utils.html.urlize() depends on string.letters being automatically convertible into unicode. If it is changed by the locale module, however, it sometimes contains non-ascii characters, which causes very mysterious unicode decode errors on sites that use the function. I've included a patch to have it use string.ascii_letters instead, which does not change.

Attachments

urlize_ascii_letters.patch (0.8 kB) - added by Andrew Stoneman <astoneman@gmail.com> on 10/01/07 22:46:40.
patch to use ascii_letters instead of letters
urlize_with_ascii_plus_unittests.patch (3.1 kB) - added by shaleh on 12/02/07 02:12:52.
updated patch, adds unittests

Change History

10/01/07 22:46:40 changed by Andrew Stoneman <astoneman@gmail.com>

  • attachment urlize_ascii_letters.patch added.

patch to use ascii_letters instead of letters

10/03/07 00:35:00 changed by Andrew Stoneman <astoneman@gmail.com>

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

A sample session to show the problem:

>>> import locale
>>> import string
>>> from django.utils.html import urlize
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> locale.setlocale(locale.LC_ALL, 'de_DE')
'de_DE'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4
\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> urlize('abc')
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django/utils/functional.py",
line 129, in wrapper
    return func(*args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django/utils/html.py",
line 82, in urlize
    if middle.startswith('www.') or ('@' not in middle and not middle.startswith('http://') and \
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 52: ordinal not in range(128)

12/01/07 20:46:02 changed by Simon G <dev@simon.net.nz>

  • needs_tests set to 1.
  • stage changed from Unreviewed to Ready for checkin.

Andrew - this looks good, but can we get a regression test (like if your example above)?

12/02/07 02:12:52 changed by shaleh

  • attachment urlize_with_ascii_plus_unittests.patch added.

updated patch, adds unittests

12/02/07 02:13:57 changed by shaleh

  • keywords set to sprintdec01.
  • needs_tests deleted.

unittests added

12/02/07 18:39:46 changed by mtredinnick

Unfortunately, the test case isn't sufficiently portable (for example, on my Ubuntu laptop, it fails with an "invalid locale" error). Rather than worrying too much about lots of different installation situations, I'm just going to commit the core patch. It's correct and I can live without tests for this small change.

12/02/07 18:41:42 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [6856]) Fixed #5657 -- Use string.ascii_letters instead of ascii.letters in the urlize filter to ensure consistent (and correct) results no matter what the server's locale setting might be. Thanks, Andrew Stoneman.


Add/Change #5657 ([patch] urlize breaks when string.letters is changed by the locale)




Change Properties
Action