Code

Opened 4 years ago

Closed 2 years ago

#13704 closed Bug (fixed)

utils.html.urlize mishandles IDN style domain names

Reported by: dryan Owned by: nobody
Component: Template system Version: 1.2
Severity: Normal Keywords: IDN, urlize
Cc: dougal85@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The urlize function runs urlquote on the url it is processing which incorrectly handles domain names with unicode characters in them (IDN style).

To test, run urlize('http://c✶.ws'). The text of the link is fine, the href of the anchor tag is not.

Attachments (4)

django.diff (975 bytes) - added by dryan 4 years ago.
django.2.diff (986 bytes) - added by dryan 4 years ago.
Better Exception handling
13704-3.diff (4.0 KB) - added by claudep 2 years ago.
Patch with tests
13704-4.patch (4.1 KB) - added by aaugustin 2 years ago.

Download all attachments as: .zip

Change History (13)

Changed 4 years ago by dryan

comment:1 Changed 4 years ago by Alex

  • Needs documentation set
  • Needs tests set
  • Patch needs improvement unset

a) Needs tests.

b) There's almost certainly no cause to catch EVERY single exception coming out of there, it should catch only the applicable errors.

Changed 4 years ago by dryan

Better Exception handling

comment:2 Changed 4 years ago by russellm

  • Component changed from Uncategorized to Template system
  • Has patch set
  • Triage Stage changed from Unreviewed to Accepted

Still needs tests; also needs PEP8 (i..e, strip all that extra space around the parentheses).

comment:3 Changed 4 years ago by d0ugal

I've reviewed the ticket when attempting to add tests. I'm not sure how to handle IDN domains. I got the following result from a quick doctest in the defaultfilters tests;

Expected:
    u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://c✶.ws</a>'
Got:
    u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>'

Should the URL be encoded the same way for both the href and the innerHTML? I wasn't sure if it needed 'html encoding' ? When I attempted adding '<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>' to a HTML file and opened it in chrome it displayed the (punny code?) characters rather than the expected unicode result

comment:4 Changed 3 years ago by adamnelson

#12988 might be relevant to this

comment:5 Changed 3 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:6 Changed 3 years ago by d0ugal

  • Cc dougal85@… added
  • Easy pickings unset

comment:7 Changed 3 years ago by jezdez

  • Patch needs improvement set

Changed 2 years ago by claudep

Patch with tests

comment:8 Changed 2 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • UI/UX unset

New patch handles IDN names in various recognized forms. The visible part is left unconverted, as suggested in comment:3.

Changed 2 years ago by aaugustin

comment:9 Changed 2 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

In [17348]:

Fixed #13704 -- Handled IDN properly in the urlize template filter. Thanks Claude Paroz for the initial version of the patch.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.