Opened 6 years ago

Closed 5 years ago

#13704 closed Bug (fixed)

utils.html.urlize mishandles IDN style domain names

Reported by: Daniel Ryan Owned by: nobody
Component: Template system Version: 1.2
Severity: Normal Keywords: IDN, urlize
Cc: dougal85@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The urlize function runs urlquote on the url it is processing which incorrectly handles domain names with unicode characters in them (IDN style).

To test, run urlize('http://c✶.ws'). The text of the link is fine, the href of the anchor tag is not.

Attachments (4)

django.diff (975 bytes) - added by Daniel Ryan 6 years ago.
django.2.diff (986 bytes) - added by Daniel Ryan 6 years ago.
Better Exception handling
13704-3.diff (4.0 KB) - added by Claude Paroz 5 years ago.
Patch with tests
13704-4.patch (4.1 KB) - added by Aymeric Augustin 5 years ago.

Download all attachments as: .zip

Change History (13)

Changed 6 years ago by Daniel Ryan

Attachment: django.diff added

comment:1 Changed 6 years ago by Alex Gaynor

Needs documentation: set
Needs tests: set
Patch needs improvement: unset

a) Needs tests.

b) There's almost certainly no cause to catch EVERY single exception coming out of there, it should catch only the applicable errors.

Changed 6 years ago by Daniel Ryan

Attachment: django.2.diff added

Better Exception handling

comment:2 Changed 6 years ago by Russell Keith-Magee

Component: UncategorizedTemplate system
Has patch: set
Triage Stage: UnreviewedAccepted

Still needs tests; also needs PEP8 (i..e, strip all that extra space around the parentheses).

comment:3 Changed 6 years ago by Dougal Matthews

I've reviewed the ticket when attempting to add tests. I'm not sure how to handle IDN domains. I got the following result from a quick doctest in the defaultfilters tests;

Expected:
    u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://c✶.ws</a>'
Got:
    u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>'

Should the URL be encoded the same way for both the href and the innerHTML? I wasn't sure if it needed 'html encoding' ? When I attempted adding '<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>' to a HTML file and opened it in chrome it displayed the (punny code?) characters rather than the expected unicode result

comment:4 Changed 6 years ago by Adam Nelson

#12988 might be relevant to this

comment:5 Changed 5 years ago by Julien Phalip

Severity: Normal
Type: Bug

comment:6 Changed 5 years ago by Dougal Matthews

Cc: dougal85@… added
Easy pickings: unset

comment:7 Changed 5 years ago by Jannis Leidel

Patch needs improvement: set

Changed 5 years ago by Claude Paroz

Attachment: 13704-3.diff added

Patch with tests

comment:8 Changed 5 years ago by Claude Paroz

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
UI/UX: unset

New patch handles IDN names in various recognized forms. The visible part is left unconverted, as suggested in comment:3.

Changed 5 years ago by Aymeric Augustin

Attachment: 13704-4.patch added

comment:9 Changed 5 years ago by Aymeric Augustin

Resolution: fixed
Status: newclosed

In [17348]:

Fixed #13704 -- Handled IDN properly in the urlize template filter. Thanks Claude Paroz for the initial version of the patch.

Note: See TracTickets for help on using tickets.
Back to Top