Opened 9 years ago

Closed 8 years ago

#26005 closed Bug (fixed)

uri_to_iri() perfoms percent decoding incorrectly

Reported by: Chronial Owned by: varun naganathan
Component: Utilities Version: 1.8
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The current implementation of uri_to_iri is incorrect.

In step two of the algorithm, it should:

Convert all percent-encodings ("%" followed by two hexadecimal digits) to the corresponding octets, except those corresponding to "%", characters in "reserved", and characters in US-ASCII not allowed in URIs.

But instead it just runs an unquote (source)

This also makes this statement from the docs a lie:

Both iri_to_uri() and uri_to_iri() functions are idempotent, which means the following is always true:
uri_to_iri(uri_to_iri(some_string)) == uri_to_iri(some_string)

But at the moment

uri_to_iri(uri_to_iri("%2525")) == "%" != "%25" == uri_to_iri("%2525")

Change History (11)

comment:1 by Tim Graham, 9 years ago

Component: Core (URLs)Utilities
Summary: uri_to_iri() brokenuri_to_iri() perfoms percent decoding incorrectly
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug
Version: 1.91.8

If you could provide a patch soon, we might backport it to 1.8 given this function was introduced there.

comment:2 by varun naganathan, 9 years ago

Owner: changed from nobody to varun naganathan
Status: newassigned

Hi,i wanted to work on this.Added a small tweak to avoid converting the percent encoding for '%' i.e. '%25'.basically I'm replacing all instances of '%25' in the uri string with '%'.I tried a few tests and they seem to work.Do let me know if that's fine.I'll organize the code and post a PR.
I'm pretty new to contributing to django so I'm sorry if I wasn't suppose to self assign this bug.

comment:3 by varun naganathan, 9 years ago

Also I have a doubt regarding the RFC guidelines.When RFC 3987 states that the percent encoding corresponding to '%' i.e. '%25' must not be converted to its octet , does that mean that uri_to_iri("%2525") should return %2525 only as the result?

If %2525 only must be the output I suggest I override the call to the inbuilt unquote() method with a custom written method.Does that sound good?

Last edited 9 years ago by varun naganathan (previous) (diff)

comment:4 by varun naganathan, 9 years ago

Created a trial Pull Request
https://github.com/django/django/pull/6085
Please review

comment:5 by Chronial, 8 years ago

@timgraham I created A PR against 1.8.x: https://github.com/django/django/pull/6426

This now correctly implements the algorithm from the RFC, except for step 4: The implementation is supposed to re-encode some unicode characters “that are not appropriate” for IRIs. I added a note about that to the docstring.

comment:6 by Chronial, 8 years ago

New PR against master: https://github.com/django/django/pull/6428. I also didn't foresee what that this fix would affect the testing infrastructure so I didn't run the whole test suite last time. Now everything should work.

comment:7 by Tim Graham, 8 years ago

Has patch: set

comment:8 by Philip James, 8 years ago

Patch needs improvement: set

Setting the 'patch_needs_improvement' bit as there are failing checks and open questions on the PR.

in reply to:  8 comment:9 by Chronial, 8 years ago

Replying to phildini:

Setting the 'patch_needs_improvement' bit as there are failing checks and open questions on the PR.

What are the open questions?

comment:10 by Tim Graham, 8 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:11 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In 03281d8:

Fixed #26005 -- Fixed some percent decoding cases in uri_to_iri().

Note: See TracTickets for help on using tickets.
Back to Top