#36586 closed Bug (invalid)
Escaping (ampersand) in browsable API URLs
Reported by: | J M | Owned by: | |
---|---|---|---|
Component: | Template system | Version: | 5.2 |
Severity: | Normal | Keywords: | urlize |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When URLs with an escaped character (specifically in my case, and ampersand) is rendered in the browsable API, in the href it is improperly unescaped. This may only apply to ampersands.
from django.utils.html import urlize urlize('"tq": "http://api/foos/1/?p=1×=1"') '"tq": "<a href="http://api/foos/1/?p=1%C3%97%3D1">http://api/foos/1/?p=1×=1</a>"'
Change History (3)
comment:1 by , 6 weeks ago
Component: | Uncategorized → Template system |
---|---|
Keywords: | urlize added |
Resolution: | → invalid |
Status: | new → closed |
Type: | Uncategorized → New feature |
comment:2 by , 6 weeks ago
Type: | New feature → Bug |
---|
comment:3 by , 2 weeks ago
To whoever finds this ticket...
I think the problem wasn't reported in the best way by OP. The issue was indeed caught in the browsable API in DRF, and we managed to isolate the problem with the following snippet:
>>> from django.utils.html import urlize >>> urlize('http://example.com/foos/?page=2×tamp=1') '<a href="http://example.com/foos/?page=2%C3%97tamp%3D1">http://example.com/foos/?page=2×tamp=1</a>'
The problem manifest by ×tamp=1
being translated to %C3%97tamp%3D1
. I did't see the string ×
in that, so suspected a bug, potentially inherited from Python. Looking more closely at the Django implementation, it indeed relies heavily on the Python API html.unescape
, which has the same behaviour:
>>> import html >>> html.unescape('https://example.com/?page=1×tamp=3') 'https://example.com/?page=1×tamp=3'
Searching the cPython issue tracker brought up this issue https://github.com/python/cpython/issues/85050 which says:
According to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#cite_ref-semicolon_1-64 the trailing semicolon can be omitted for the named entity "reg". That means "®" and "®" are equivalent.
So this working as per the spec.
Hello J M, thank you for your ticket.
First of all, can you please clarify what do you mean with "browsable API"? This sounds like the django-rest-framework feature. Please note that this tracker is for Django core issues.
Secondly, regarding the
urlize
example you shared, the behavior occurs specifically when the URL contains×
(the HTML entity for×
), rather than any arbitrary ampersand. This happens becauseurlize
is designed to produce HTML-safe links, which may involve encoding characters in the URL to ensure valid HTML. Its purpose is linkification of text for safe display, not exact preservation of the raw URL string.You can see the tests for this filter to understand better its scope and semantics: https://github.com/django/django/blob/main/tests/template_tests/filter_tests/test_urlize.py
Lastly, there are several user support channels available if you have further questions about how Django works: please refer to TicketClosingReasons/UseSupportChannels for ways to get help.