Opened 6 years ago

Closed 5 years ago

#13704 closed Bug (fixed)

utils.html.urlize mishandles IDN style domain names

Reported by: dryan Owned by: nobody
Component: Template system Version: 1.2
Severity: Normal Keywords: IDN, urlize
Cc: dougal85@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


The urlize function runs urlquote on the url it is processing which incorrectly handles domain names with unicode characters in them (IDN style).

To test, run urlize('http://c✶.ws'). The text of the link is fine, the href of the anchor tag is not.

Attachments (4)

django.diff (975 bytes) - added by dryan 6 years ago.
django.2.diff (986 bytes) - added by dryan 6 years ago.
Better Exception handling
13704-3.diff (4.0 KB) - added by claudep 5 years ago.
Patch with tests
13704-4.patch (4.1 KB) - added by aaugustin 5 years ago.

Download all attachments as: .zip

Change History (13)

Changed 6 years ago by dryan

comment:1 Changed 6 years ago by Alex

  • Needs documentation set
  • Needs tests set
  • Patch needs improvement unset

a) Needs tests.

b) There's almost certainly no cause to catch EVERY single exception coming out of there, it should catch only the applicable errors.

Changed 6 years ago by dryan

Better Exception handling

comment:2 Changed 6 years ago by russellm

  • Component changed from Uncategorized to Template system
  • Has patch set
  • Triage Stage changed from Unreviewed to Accepted

Still needs tests; also needs PEP8 (i..e, strip all that extra space around the parentheses).

comment:3 Changed 6 years ago by d0ugal

I've reviewed the ticket when attempting to add tests. I'm not sure how to handle IDN domains. I got the following result from a quick doctest in the defaultfilters tests;

    u'<a href="" rel="nofollow">http://c✶.ws</a>'
    u'<a href="" rel="nofollow"></a>'

Should the URL be encoded the same way for both the href and the innerHTML? I wasn't sure if it needed 'html encoding' ? When I attempted adding '<a href="" rel="nofollow"></a>' to a HTML file and opened it in chrome it displayed the (punny code?) characters rather than the expected unicode result

comment:4 Changed 6 years ago by adamnelson

#12988 might be relevant to this

comment:5 Changed 5 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:6 Changed 5 years ago by d0ugal

  • Cc dougal85@… added
  • Easy pickings unset

comment:7 Changed 5 years ago by jezdez

  • Patch needs improvement set

Changed 5 years ago by claudep

Patch with tests

comment:8 Changed 5 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • UI/UX unset

New patch handles IDN names in various recognized forms. The visible part is left unconverted, as suggested in comment:3.

Changed 5 years ago by aaugustin

comment:9 Changed 5 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

In [17348]:

Fixed #13704 -- Handled IDN properly in the urlize template filter. Thanks Claude Paroz for the initial version of the patch.

Note: See TracTickets for help on using tickets.
Back to Top