Opened 9 years ago
Closed 8 years ago
#26005 closed Bug (fixed)
uri_to_iri() perfoms percent decoding incorrectly
Reported by: | Chronial | Owned by: | varun naganathan |
---|---|---|---|
Component: | Utilities | Version: | 1.8 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The current implementation of uri_to_iri is incorrect.
In step two of the algorithm, it should:
Convert all percent-encodings ("%" followed by two hexadecimal digits) to the corresponding octets, except those corresponding to "%", characters in "reserved", and characters in US-ASCII not allowed in URIs.
But instead it just runs an unquote (source)
This also makes this statement from the docs a lie:
Both iri_to_uri() and uri_to_iri() functions are idempotent, which means the following is always true:
uri_to_iri(uri_to_iri(some_string)) == uri_to_iri(some_string)
But at the moment
uri_to_iri(uri_to_iri("%2525")) == "%" != "%25" == uri_to_iri("%2525")
Change History (11)
comment:1 by , 9 years ago
Component: | Core (URLs) → Utilities |
---|---|
Summary: | uri_to_iri() broken → uri_to_iri() perfoms percent decoding incorrectly |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
Version: | 1.9 → 1.8 |
comment:2 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Hi,i wanted to work on this.Added a small tweak to avoid converting the percent encoding for '%' i.e. '%25'.basically I'm replacing all instances of '%25' in the uri string with '%'.I tried a few tests and they seem to work.Do let me know if that's fine.I'll organize the code and post a PR.
I'm pretty new to contributing to django so I'm sorry if I wasn't suppose to self assign this bug.
comment:3 by , 9 years ago
Also I have a doubt regarding the RFC guidelines.When RFC 3987 states that the percent encoding corresponding to '%' i.e. '%25' must not be converted to its octet , does that mean that uri_to_iri("%2525") should return %2525 only as the result?
If %2525 only must be the output I suggest I override the call to the inbuilt unquote() method with a custom written method.Does that sound good?
comment:4 by , 9 years ago
Created a trial Pull Request
https://github.com/django/django/pull/6085
Please review
comment:5 by , 8 years ago
@timgraham I created A PR against 1.8.x: https://github.com/django/django/pull/6426
This now correctly implements the algorithm from the RFC, except for step 4: The implementation is supposed to re-encode some unicode characters “that are not appropriate” for IRIs. I added a note about that to the docstring.
comment:6 by , 8 years ago
New PR against master: https://github.com/django/django/pull/6428. I also didn't foresee what that this fix would affect the testing infrastructure so I didn't run the whole test suite last time. Now everything should work.
comment:7 by , 8 years ago
Has patch: | set |
---|
follow-up: 9 comment:8 by , 8 years ago
Patch needs improvement: | set |
---|
Setting the 'patch_needs_improvement' bit as there are failing checks and open questions on the PR.
comment:9 by , 8 years ago
Replying to phildini:
Setting the 'patch_needs_improvement' bit as there are failing checks and open questions on the PR.
What are the open questions?
comment:10 by , 8 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
If you could provide a patch soon, we might backport it to 1.8 given this function was introduced there.