Opened 14 years ago

Closed 13 years ago

#13704 closed Bug (fixed)

utils.html.urlize mishandles IDN style domain names

Reported by: Daniel Ryan Owned by: nobody
Component: Template system Version: 1.2
Severity: Normal Keywords: IDN, urlize
Cc: dougal85@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The urlize function runs urlquote on the url it is processing which incorrectly handles domain names with unicode characters in them (IDN style).

To test, run urlize('http://c✶.ws'). The text of the link is fine, the href of the anchor tag is not.

Attachments (4)

django.diff (975 bytes ) - added by Daniel Ryan 14 years ago.
django.2.diff (986 bytes ) - added by Daniel Ryan 14 years ago.
Better Exception handling
13704-3.diff (4.0 KB ) - added by Claude Paroz 13 years ago.
Patch with tests
13704-4.patch (4.1 KB ) - added by Aymeric Augustin 13 years ago.

Download all attachments as: .zip

Change History (13)

by Daniel Ryan, 14 years ago

Attachment: django.diff added

comment:1 by Alex Gaynor, 14 years ago

Needs documentation: set
Needs tests: set

a) Needs tests.

b) There's almost certainly no cause to catch EVERY single exception coming out of there, it should catch only the applicable errors.

by Daniel Ryan, 14 years ago

Attachment: django.2.diff added

Better Exception handling

comment:2 by Russell Keith-Magee, 14 years ago

Component: UncategorizedTemplate system
Has patch: set
Triage Stage: UnreviewedAccepted

Still needs tests; also needs PEP8 (i..e, strip all that extra space around the parentheses).

comment:3 by Dougal Matthews, 14 years ago

I've reviewed the ticket when attempting to add tests. I'm not sure how to handle IDN domains. I got the following result from a quick doctest in the defaultfilters tests;

Expected:
    u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://c✶.ws</a>'
Got:
    u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>'

Should the URL be encoded the same way for both the href and the innerHTML? I wasn't sure if it needed 'html encoding' ? When I attempted adding '<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>' to a HTML file and opened it in chrome it displayed the (punny code?) characters rather than the expected unicode result

comment:4 by Adam Nelson, 14 years ago

#12988 might be relevant to this

comment:5 by Julien Phalip, 14 years ago

Severity: Normal
Type: Bug

comment:6 by Dougal Matthews, 14 years ago

Cc: dougal85@… added
Easy pickings: unset

comment:7 by Jannis Leidel, 14 years ago

Patch needs improvement: set

by Claude Paroz, 13 years ago

Attachment: 13704-3.diff added

Patch with tests

comment:8 by Claude Paroz, 13 years ago

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
UI/UX: unset

New patch handles IDN names in various recognized forms. The visible part is left unconverted, as suggested in comment:3.

by Aymeric Augustin, 13 years ago

Attachment: 13704-4.patch added

comment:9 by Aymeric Augustin, 13 years ago

Resolution: fixed
Status: newclosed

In [17348]:

Fixed #13704 -- Handled IDN properly in the urlize template filter. Thanks Claude Paroz for the initial version of the patch.

Note: See TracTickets for help on using tickets.
Back to Top