#13260 closed Bug (fixed)
urlresolvers.reverse() generates invalid URLs when an argument contains % character
Reported by: | Ilya Semenov | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Core (URLs) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | hv@…, ben@…, erikrose | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If I call django.core.urlresolvers.reverse() where one of (string) args contains a percent character ("%"), it generates an invalid URL: the percent symbols are not replaced with %25. Interestingly, space characters are properly replaced with %20.
Consider the following example:
# urls.py urlpatterns = patterns('', (r'^download/(.*)$', 'myapp.views.download'), ) # views.py def download(request, filename): pass # somewhere else filename = '100% completed.png' print reverse('myapp.views.download', args=[filename]) # /download/100%%20completed.png - malformed URL, Apache replies with 400 Bad Request error code print reverse('myapp.views.download', args=[urlquote(filename)]) # /download/100%25%20completed.png - correct URL
reverse() should call urlquote() itself - because there's never any point NOT to do that. Moreover, the current implementation forces a user to apply urlquote() for each and every string parameter, otherwise there's always a chance that an invalid URL would be generated.
Attachments (3)
Change History (25)
comment:1 by , 15 years ago
Component: | Uncategorized → Core framework |
---|
by , 15 years ago
Attachment: | t13260-test.diff added |
---|
comment:2 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 15 years ago
Attachment: | t13260-test2.diff added |
---|
Additional tests to also cover % in keyword arguments
comment:4 by , 15 years ago
Has patch: | set |
---|
Added a copy of russellm's regression test to also test kwargs.
Have resolved by applying urlquote to the args and kwargs values. Special characters "+$*/" are marked as safe to comply with existing regression tests.
This is my first Django patch, so any feedback would be appreciated.
follow-up: 6 comment:5 by , 15 years ago
I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.
comment:6 by , 15 years ago
Replying to semenov:
I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.
The call to iri_to_uri() does replace some characters that are prohibited in URIs. You're correct, that there's some duplicate checking for these characters, but I don't think it can be removed.
Looking at this ticket again though, I don't think that it's right to supply any safe chars to my call to urlquote. What this would mean though is that either some of the tests are wrong or that it's the user's responsibility to only put valid characters in reverse's args and kwargs (meaning this isn't a bug). Will raise on django-dev.
comment:7 by , 15 years ago
Cc: | added |
---|
comment:8 by , 15 years ago
Here is the mail in django-dev:
http://www.mail-archive.com/django-developers@googlegroups.com/msg25678.html
by , 15 years ago
Attachment: | t13260-patch.diff added |
---|
Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.
comment:9 by , 15 years ago
Triage Stage: | Accepted → Design decision needed |
---|---|
Version: | 1.1 → SVN |
As to whether args and kwargs should be quoted (ie. whether this is a bug or is expected behaviour), there was one response each way to the post mentioned above.
The patch I've added assumes args and kwargs should be quoted (ie. bug). It modifies an existing test to fit this assumption. If you don't think this assumption is correct, please ignore the patch and close this ticket.
comment:10 by , 15 years ago
Cc: | added |
---|
comment:11 by , 14 years ago
Cc: | added |
---|
comment:12 by , 14 years ago
Type: | → Bug |
---|
comment:13 by , 14 years ago
Severity: | → Normal |
---|
comment:16 by , 12 years ago
Component: | Core (Other) → Core (URLs) |
---|
comment:17 by , 12 years ago
Owner: | changed from | to
---|---|
Triage Stage: | Design decision needed → Accepted |
Judging by the reference implementation, WSGI urlunquotes the path before putting it in environ['PATH_INFO']
where Django reads it.
That's where things get complicated :)
1) To round-trip properly through reverse / resolve, arguments must not be urlquoted by reverse — there's nothing to urlunquote them in this scenario.
2) To round-trip properly through reverse / render in template / click in browser / resolve, arguments must be urlquoted by reverse.
The docs for reverse say that "the string returned by reverse()
is already urlquoted", which obviously isn't true for at least some special characters.
This refers to the fact that reverse()
calls iri_to_uri()
, but that function is primarily concerned with escaping non-ASCII characters. It's also idempotent, which means it won't escape any character that's legal in an URL.
I recommmend to change reverse()
to urlquote its arguments, for the following reasons:
- (2) above is the common case
- The current behavior doesn't match the docs
- The current behavior may result in invalid URLs (that still work in practic -- URL handling code is notoriously robust to malformed inputs!)
- Contributors who looked at this ticket until now were mostly in favor of considering this a bug
It will be worth a note in the "backwards incompatible changes" because it seems very likely that some developers are working around the current, buggy behavior by urlquoting arguments to reverse, and they would get double-quoting.
comment:18 by , 12 years ago
I've created a little experiment to showcase the bug:
# Put this in a file called experiment13260.py and run it with: # django-admin.py runserver --settings=experiment13260 from django.conf.urls import patterns, url from django.http import HttpResponse from django.template import Context, Template DEBUG = True ROOT_URLCONF = 'experiment13260' SECRET_KEY = 'whatever' TEMPLATE = Template("""<html> <head> <title>Experiment with Django ticket #13260</title> </head> <body> <p>Put something with special characters in the URL!</p> <p>Argument passed to the view: <b>{{ arg }}</b></p> <p>Reversed URL: <b><a href="{% url 'view' arg %}">{% url 'view' arg %}</a></b></p> </body> </html>""") urlpatterns = patterns('', url(r'^(.*)/?$', lambda req, arg: HttpResponse(TEMPLATE.render(Context({'arg': arg}))), name='view'), )
If you go to http://localhost:8000/%252525/, every time you click the "Reversed URL" link, you lose a 25 in the URL.
comment:20 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Test case demonstrating problem