Opened 7 years ago
Closed 7 years ago
#29826 closed Bug (wontfix)
urlize does not handle HTML angle brackets correctly
| Reported by: | Marc Dequènes (Duck) | Owned by: | nobody |
|---|---|---|---|
| Component: | Template system | Version: | 1.11 |
| Severity: | Normal | Keywords: | |
| Cc: | Vishvajit Pathak | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Quack,
It was supposedly fixed in #24471 but I can reproduce with 1.11.16:
>>> import django
>>> django.VERSION
(1, 11, 16, u'final', 0)
>>> from django.template.defaultfilters import urlize
>>> print urlize("coin <https://www.example.org/> coin", 76)
coin &lt;<a href="https://www.example.org/%3E" rel="nofollow">https://www.example.org/&gt;</a> coin
Same with Python 3 and 2.1.2:
>>> django.VERSION
(2, 1, 2, 'final', 0)
>>> print(urlize("coin <https://www.example.org/> coin", 76))
coin &lt;<a href="https://www.example.org/%3E" rel="nofollow">https://www.example.org/&gt;</a> coin
I'll try to have a deeper look at the code but at the moment I have no fix to suggest.
This impact Mailman 3: https://gitlab.com/mailman/hyperkitty/issues/29
Hope we can find a workaround.
Regards.
\_o<
Change History (8)
comment:1 by , 7 years ago
| Cc: | added |
|---|
comment:2 by , 7 years ago
comment:3 by , 7 years ago
| Resolution: | → invalid |
|---|---|
| Status: | new → closed |
comment:4 by , 7 years ago
| Resolution: | invalid |
|---|---|
| Status: | closed → new |
Quack,
I did not see this note which was not present in older versions.
Anyway the code show signs of HTML handling (WRAPPING_PUNCTUATION, _html_escapes, unescape…) so clearly this was not the original intent.
I have created a patch which fixes this problem: https://github.com/django/django/pull/10495
So would you please reconsider and review this patch?
Regards
\_o<
comment:5 by , 7 years ago
Hi Marc.
The test case in the PR doesn't quite match the example URL in the description here (or the discussion of trailing > chars on the Mailman issue linked). Is it redundant, or already covered by #24471?
On the basis of the test case, I'm inclined to Accept this. (Didn't yet have time to fully review the patch.)
comment:6 by , 7 years ago
The note that urlize should be used in plaintext is there since Django 1.0 (cfd5b184fba4f51a6756fecaf4ec4272f3e23bfc). '<' and '>' have been in WRAPPING_PUNCTUATION (LEADING_PUNCTUATION and TRAILING_PUNCTUATION before 15d10a5210378bba88c7dfa1f45a4d3528ddfc3f) since Django's initial public commit, but I don't think that justifies changing the behavior.
I created a PR to remove unused characters in WRAPPING_PUNCTUATION.
Maybe your patch has some merit for simplification even if it doesn't change behavior?
comment:8 by , 7 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | new → closed |
The test in the ticket description worked before 15d10a5210378bba88c7dfa1f45a4d3528ddfc3f (which is in Django 1.4). As no one complained in 6 years since Django 1.4 was released, I don't see the past behavior as a precedent to readd it. You can make your case on the DevelopersMailingList but I don't see what justification there is based on the documentation.
On the pull request, Marc suggested adding a new urlize_html() function and template filter but I doubt that's something we would add to Django (if it's even feasible).
See this note in
urlizedocs:I guess you might want to write a custom filter for your use case.