Implement URL decoding according to RFC 3987
|Reported by:||Aymeric Augustin||Owned by:||Loic Bistuer|
|Cc:||anubhav9042@…||Triage Stage:||Ready for checkin|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Since #5738, when Django fails to decode an URL because it isn't valid UTF-8, it returns a HTTP 400 error with no content.
It's a bit sloppy to use a protocol-level message for an application-level requirement — in other words, to reply to a well-formed HTTP request with 400 Bad Request. Section 3.2 of RFC 3987 proposes a solution to this problem. Basically, non-ASCII bytes that do not create a valid utf-8 sequence should remain URL-encoded. This may not be trivial to implement, but it provide better error handling and it's normalized.
Django builds URLs according to section 3.1 of RFC 3987. With this change, URLs will round trip cleanly through the reversing / resolving (that's one of the guarantees of RFC 3987) and Django will be able to deal with legacy, non-utf-8 URLs. I pursued these goals in #19468 with a more primitive technique (depending only on the encoding) and that didn't work out.