Opened 5 years ago

Closed 5 years ago

#29826 closed Bug (wontfix)

urlize does not handle HTML angle brackets correctly

Reported by: Marc Dequènes (Duck) Owned by: nobody
Component: Template system Version: 1.11
Severity: Normal Keywords:
Cc: Vishvajit Pathak Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no



It was supposedly fixed in #24471 but I can reproduce with 1.11.16:

>>> import django
>>> django.VERSION
(1, 11, 16, u'final', 0)
>>> from django.template.defaultfilters import urlize
>>> print urlize("coin <> coin", 76)
coin &amp;lt;<a href="" rel="nofollow">;gt;</a> coin

Same with Python 3 and 2.1.2:

>>> django.VERSION
(2, 1, 2, 'final', 0)
>>> print(urlize("coin &lt;; coin", 76))
coin &amp;lt;<a href="" rel="nofollow">;gt;</a> coin

I'll try to have a deeper look at the code but at the moment I have no fix to suggest.

This impact Mailman 3:
Hope we can find a workaround.


Change History (8)

comment:1 by Vishvajit Pathak, 5 years ago

Cc: Vishvajit Pathak added

comment:2 by Claude Paroz, 5 years ago

See this note in urlize docs:

If urlize is applied to text that already contains HTML markup, things won’t work as expected. Apply this filter only to plain text.

I guess you might want to write a custom filter for your use case.

comment:3 by Tim Graham, 5 years ago

Resolution: invalid
Status: newclosed

comment:4 by Marc Dequènes (Duck), 5 years ago

Resolution: invalid
Status: closednew


I did not see this note which was not present in older versions.

Anyway the code show signs of HTML handling (WRAPPING_PUNCTUATION, _html_escapes, unescape…) so clearly this was not the original intent.

I have created a patch which fixes this problem:

So would you please reconsider and review this patch?


comment:5 by Carlton Gibson, 5 years ago

Hi Marc.

The test case in the PR doesn't quite match the example URL in the description here (or the discussion of trailing > chars on the Mailman issue linked). Is it redundant, or already covered by #24471?

On the basis of the test case, I'm inclined to Accept this. (Didn't yet have time to fully review the patch.)

comment:6 by Tim Graham, 5 years ago

The note that urlize should be used in plaintext is there since Django 1.0 (cfd5b184fba4f51a6756fecaf4ec4272f3e23bfc). '&lt;' and '&gt;' have been in WRAPPING_PUNCTUATION (LEADING_PUNCTUATION and TRAILING_PUNCTUATION before 15d10a5210378bba88c7dfa1f45a4d3528ddfc3f) since Django's initial public commit, but I don't think that justifies changing the behavior.

I created a PR to remove unused characters in WRAPPING_PUNCTUATION.

Maybe your patch has some merit for simplification even if it doesn't change behavior?

comment:7 by Carlton Gibson <carlton.gibson@…>, 5 years ago

In 91054863:

Refs #29826 -- Removed unused characters from urlize configuration.

The HTML characters are unused because urlize is meant to be applied to
plain text and these characters aren't properly detected (refs #29826).
Angle brackets and quotes are present in word_split_re and therefore
won't be used in WRAPPING_PUNCTUATION.

comment:8 by Tim Graham, 5 years ago

Resolution: wontfix
Status: newclosed

The test in the ticket description worked before 15d10a5210378bba88c7dfa1f45a4d3528ddfc3f (which is in Django 1.4). As no one complained in 6 years since Django 1.4 was released, I don't see the past behavior as a precedent to readd it. You can make your case on the DevelopersMailingList but I don't see what justification there is based on the documentation.

On the pull request, Marc suggested adding a new urlize_html() function and template filter but I doubt that's something we would add to Django (if it's even feasible).

Note: See TracTickets for help on using tickets.
Back to Top