Changes between Initial Version and Version 1 of Ticket #36483
- Timestamp:
- Jun 27, 2025, 1:46:19 PM (2 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Ticket #36483 – Description
initial v1 3 3 I was recently surprised to find that a simple detail view URL with a model ID in it was also accessible at a URL using "full width" digit characters. For example the page at "/pizza/123" could also be returned from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13. It turns out this is ultimately because the model `IntegerField` is using `int` to get an integer from the string that was originally in the URL. And I was surprised to find Python's `int` constructor uses `unicodedata.decimal` (or some equivalent) to translate from characters in a string to decimal digits. 4 4 5 That was a cool accidental feature to discover y, however now I'm concerned about URL canonicalization. Python 3.13.3 accepts _68_ different characters for each digit. This means the same content is hypothetically accessible from many, many URLs. I've heard that can make a site look spammy to search engines. And maybe this could be an element of a security hole if something is assuming there is only one URL for a given page.5 That was a cool accidental feature to discover, however now I'm concerned about URL canonicalization. Python 3.13.3 accepts _68_ different characters for each digit. This means the same content is hypothetically accessible from many, many URLs. I've heard that can make a site look spammy to search engines. And maybe this could be an element of a security hole if something is assuming there is only one URL for a given page. 6 6 7 7 The SEO problem could be addressed by setting a `<link rel=canonical>` in the page to point to `Pizza.objects.get(pk=id).get_absolute_url()` or some similar logic, or you could address the problem as a whole by setting up redirects or 404 responses, but all those approaches require a separate implementation for every view, since the view code ultimately doesn't know which parts of the URL are going to be treated as values of a `IntegerField`.