Changes between Initial Version and Version 1 of Ticket #36483


Ignore:
Timestamp:
Jun 27, 2025, 1:46:19 PM (2 months ago)
Author:
Morgan Wahl
Comment:

(Fix typo in description.)

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #36483 – Description

    initial v1  
    33I was recently surprised to find that a simple detail view URL with a model ID in it was also accessible at a URL using "full width" digit characters. For example the page at "/pizza/123" could also be returned from "/pizza/123". That's the Unicode characters U+FF11 U+FF12 U+FF13. It turns out this is ultimately because the model `IntegerField` is using `int` to get an integer from the string that was originally in the URL. And I was surprised to find Python's `int` constructor uses `unicodedata.decimal` (or some equivalent) to translate from characters in a string to decimal digits.
    44
    5 That was a cool accidental feature to discovery, however now I'm concerned about URL canonicalization. Python 3.13.3 accepts _68_ different characters for each digit. This means the same content is hypothetically accessible from many, many URLs. I've heard that can make a site look spammy to search engines. And maybe this could be an element of a security hole if something is assuming there is only one URL for a given page.
     5That was a cool accidental feature to discover, however now I'm concerned about URL canonicalization. Python 3.13.3 accepts _68_ different characters for each digit. This means the same content is hypothetically accessible from many, many URLs. I've heard that can make a site look spammy to search engines. And maybe this could be an element of a security hole if something is assuming there is only one URL for a given page.
    66
    77The SEO problem could be addressed by setting a `<link rel=canonical>` in the page to point to `Pizza.objects.get(pk=id).get_absolute_url()` or some similar logic, or you could address the problem as a whole by setting up redirects or 404 responses, but all those approaches require a separate implementation for every view, since the view code ultimately doesn't know which parts of the URL are going to be treated as values of a `IntegerField`.
Back to Top