Opened 6 years ago
Closed 5 years ago
#30043 closed Bug (needsinfo)
AdminURLFieldWidget incorrectly unquotes URLs e.g. containing %2F
Reported by: | Brenton Partridge | Owned by: | Brady |
---|---|---|---|
Component: | contrib.admin | Version: | dev |
Severity: | Normal | Keywords: | admin, urlfield, smart_urlquote, url, quote |
Cc: | Florian Apolloner, Brady | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Intended behavior: If a URLField is used in the Django admin, and the currently saved URL in the database contains the escape %2F, the href of the rendered link should contain that escape verbatim (e.g. %2F, not a / in that location).
Current behavior (master branch, Python 3.7.0): The href of the rendered link contains a / in that location, creating an extra portion of the path and, when clicked, sending the admin user to a completely different and unintended URL.
Root cause: https://github.com/django/django/blob/315357ad25a6590e7f4564ec2e56a22132b09001/django/contrib/admin/widgets.py#L340 uses smart_urlquote
to set the href (subsequently read by https://github.com/django/django/blob/master/django/contrib/admin/templates/admin/widgets/url.html ).
smart_urlquote (as implemented here: https://github.com/django/django/blob/315357ad25a6590e7f4564ec2e56a22132b09001/django/utils/html.py#L203 ) has caused problems before e.g. https://code.djangoproject.com/ticket/28123 . That issue, though, refers to a backport, whereas this behavior is broken on the master branch in Python 3.
One fix would be to not use smart_urlquote, and just ensure that the href is set to the exact same value as initially shown in the input field and rendered as the text of the <a> tag. Is there a reason why smart_urlquote was used in the first place, though?
I'm happy to submit a patch and test case if this is a desired fix.
Change History (10)
comment:1 by , 6 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 6 years ago
Cc: | added |
---|
The smart_urlquote()
dates to the original implementation in #17549 merged by Florian. I don't have much expertise to say whether or not it's useful.
I explored the first test URL http://example.com/<sometag>some text</sometag>
and found that clicking that URL stills works after removing smart_urlquote()
as long as the space in the test URL is encoded (bit test URL changed from some text
to some%20text
. That seems fine because URLValidator
won't accept URLs that unencoded spaces in them.
Tread carefully since there was also a security issue in this code, cbe6d5568f4f5053ed7228ca3c3d0cce77cf9560.
comment:3 by , 6 years ago
Dropping smart_urlquote
would result in the wrong URL returned for /äöü
. The main issue here is that we do not know if the URL supplied is already sufficiently quoted or not. Hence we try to unquote and quote again to prevent attacks which contain half quoted URLs. To me unquoting seems a valid choice to do since most users will properly copy URLs from browser location bars which would return /%C3%A4%C3%B6%C3%BC
for the previous example.
But please try to convince me otherwise, I might miss something important here (this code is tricky).
comment:4 by , 5 years ago
Cc: | added |
---|
I've opened up a PR that is my first attempt at fixing this by adding a special case to check for %2F
and pass the segment that's being quoted through iri_to_uri
instead, which has the %
sign set as "safe"
I also added a unit test that checks for this special case in its simplest form as shown above smart_urlquote('%2F')
The PR can be found here:
https://github.com/django/django/pull/11837
comment:5 by , 5 years ago
Has patch: | set |
---|
comment:6 by , 5 years ago
Needs tests: | set |
---|---|
Owner: | changed from | to
Patch needs improvement: | set |
Status: | new → assigned |
comment:7 by , 5 years ago
Hi Brady.
Florian on the PR:
I do not feel good about the special casing, also the tests seem rather minimal. How long can a segment be? what happens if the segment contains other special characters etc?
Taking the special casing first. At the DjangoCon sprints we were close to thinking this might be intractable... because of the way quote()
will always double-escape... (memory slipping now). Can you re-construct the analysis here that led to the idea to just handle the special case?
It may be that we have to say we can't solve it, but at least then we'll see *why* your suggestion is as it is.
From there we can think about whether more test cases would be needed.
Thanks for the effort!
comment:8 by , 5 years ago
Hey Carlton,
So I got close to thinking this was intractable essentially due to not being able to identify if the string had already been encoded or not. Specifically around the path that had been specified (I think this is where the original ticket came from, because during testing the %2F only became a problem when it was part of the path. Identified by the use case you showed above smart_urlquote('%2F')
) - Although I would think this is expected behavior in the sense that if you have %2F in the path, I would think you'd want that converted to a /
anyways.
I tried a few ways of handling this case without a special case, and could not come up with a viable solution, but once I added a check to see if the segment already contained %2F
was I able to pass the tests. To be honest though, the more I think about it, the more I might have been coding to pass the tests as opposed to actually solving the problem. (I'm not actually sure what the real problem is, because the ticket author didn't provide a use case)
I can continue to write some more tests, and try to figure out a way to make this less of a special case, but would love to hear both your thoughts on how much of a problem this actually is.
comment:9 by , 5 years ago
Right, yes, that was it. The issue is that there's no (easy) way to know if a string has already been quoted, which re-reading is exactly what Florian said above:
The main issue here is that we do not know if the URL supplied is already sufficiently quoted or not.
I think the use-case will be where you have a URL, with a quoted-URL as a query parameter (say). As understand it, you do want %2F
s in there, so you'd use quote_plus
, as used by default by urlencode
:
>>> from urllib.parse import quote, quote_plus, urlencode >>> quote('https://djangoproject.com/') 'https%3A//djangoproject.com/' >>> quote_plus('https://djangoproject.com/testing/') 'https%3A%2F%2Fdjangoproject.com%2Ftesting%2F' >>> urlencode({'q':'https://djangoproject.com/testing/'}) 'q=https%3A%2F%2Fdjangoproject.com%2Ftesting%2F'
But, if it's part of the path, and not the query parameters, you don't want quote_plus
. Taking Florian's example:
>>> quote('/äöü') '/%C3%A4%C3%B6%C3%BC' >>> quote_plus('/äöü') '%2F%C3%A4%C3%B6%C3%BC'
The 2nd, as part of the path, would be wrong.
The good thing about iri_to_uri()
is that it won't double-percent-escape, whereas quote()
keeps on changing the input (re-quoting) each time, so if a URL does look quoted you can pass it through and get the right results. But edge cases...???
smart_urlquote()
, perhaps why it has the name, **already** handles the the query string parameters separately, using `urlencode`. (And looking at the issue linked in the comment there, that was the exact discussion from #22267...) — SO short of concrete test cases (i.e. full example URLs where the behaviour is wrong) I can't see that we can progress.
On that basis, I'm going to close as needsinfo
. Happy to re-open if full test cases come up.
Thank you for your effort and time looking into this Brady! Welcome aboard! 🎉
comment:10 by , 5 years ago
Resolution: | → needsinfo |
---|---|
Status: | assigned → closed |
Closing per previous comment
Hi Brenton.
Thanks for the report.
I'm going to Accept this for now. It looks right...
...but I'd quite like to see an example of the actual type of URL you're hitting problems with.
Can you begin by adding a test case (here or in a PR) showing the failing example?
Thanks.