Opened 14 years ago

Closed 11 years ago

Last modified 10 years ago

#13260 closed Bug (fixed)

urlresolvers.reverse() generates invalid URLs when an argument contains % character

Reported by: Ilya Semenov Owned by: Aymeric Augustin
Component: Core (URLs) Version: dev
Severity: Normal Keywords:
Cc: hv@…, ben@…, erikrose Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I call django.core.urlresolvers.reverse() where one of (string) args contains a percent character ("%"), it generates an invalid URL: the percent symbols are not replaced with %25. Interestingly, space characters are properly replaced with %20.

Consider the following example:

# urls.py

urlpatterns = patterns('',
    (r'^download/(.*)$', 'myapp.views.download'),
)

# views.py

def download(request, filename):
    pass

# somewhere else

filename = '100% completed.png'
print reverse('myapp.views.download', args=[filename]) # /download/100%%20completed.png - malformed URL, Apache replies with 400 Bad Request error code
print reverse('myapp.views.download', args=[urlquote(filename)]) # /download/100%25%20completed.png - correct URL

reverse() should call urlquote() itself - because there's never any point NOT to do that. Moreover, the current implementation forces a user to apply urlquote() for each and every string parameter, otherwise there's always a chance that an invalid URL would be generated.

Attachments (3)

t13260-test.diff (839 bytes ) - added by Russell Keith-Magee 14 years ago.
Test case demonstrating problem
t13260-test2.diff (1.6 KB ) - added by stumbles 14 years ago.
Additional tests to also cover % in keyword arguments
t13260-patch.diff (3.3 KB ) - added by stumbles 14 years ago.
Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

Download all attachments as: .zip

Change History (25)

comment:1 by Ilya Semenov, 14 years ago

Component: UncategorizedCore framework

by Russell Keith-Magee, 14 years ago

Attachment: t13260-test.diff added

Test case demonstrating problem

comment:2 by Russell Keith-Magee, 14 years ago

Triage Stage: UnreviewedAccepted

comment:3 by stumbles, 14 years ago

Owner: changed from nobody to stumbles
Status: newassigned

by stumbles, 14 years ago

Attachment: t13260-test2.diff added

Additional tests to also cover % in keyword arguments

comment:4 by stumbles, 14 years ago

Has patch: set

Added a copy of russellm's regression test to also test kwargs.

Have resolved by applying urlquote to the args and kwargs values. Special characters "+$*/" are marked as safe to comply with existing regression tests.

This is my first Django patch, so any feedback would be appreciated.

comment:5 by Ilya Semenov, 14 years ago

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

in reply to:  5 comment:6 by stumbles, 14 years ago

Replying to semenov:

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

The call to iri_to_uri() does replace some characters that are prohibited in URIs. You're correct, that there's some duplicate checking for these characters, but I don't think it can be removed.

Looking at this ticket again though, I don't think that it's right to supply any safe chars to my call to urlquote. What this would mean though is that either some of the tests are wrong or that it's the user's responsibility to only put valid characters in reverse's args and kwargs (meaning this isn't a bug). Will raise on django-dev.

comment:7 by Thomas Güttler, 14 years ago

Cc: hv@… added

by stumbles, 14 years ago

Attachment: t13260-patch.diff added

Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

comment:9 by stumbles, 14 years ago

Triage Stage: AcceptedDesign decision needed
Version: 1.1SVN

As to whether args and kwargs should be quoted (ie. whether this is a bug or is expected behaviour), there was one response each way to the post mentioned above.

The patch I've added assumes args and kwargs should be quoted (ie. bug). It modifies an existing test to fit this assumption. If you don't think this assumption is correct, please ignore the patch and close this ticket.

comment:10 by stumbles, 14 years ago

Cc: ben@… added

comment:11 by erikrose, 13 years ago

Cc: erikrose added

comment:12 by Luke Plant, 13 years ago

Type: Bug

comment:13 by Luke Plant, 13 years ago

Severity: Normal

comment:14 by Aymeric Augustin, 12 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:15 by Aymeric Augustin, 12 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:16 by Aymeric Augustin, 11 years ago

Component: Core (Other)Core (URLs)

comment:17 by Aymeric Augustin, 11 years ago

Owner: changed from stumbles to Aymeric Augustin
Triage Stage: Design decision neededAccepted

Judging by the reference implementation, WSGI urlunquotes the path before putting it in environ['PATH_INFO'] where Django reads it.

That's where things get complicated :)

1) To round-trip properly through reverse / resolve, arguments must not be urlquoted by reverse — there's nothing to urlunquote them in this scenario.

2) To round-trip properly through reverse / render in template / click in browser / resolve, arguments must be urlquoted by reverse.


The docs for reverse say that "the string returned by reverse() is already urlquoted", which obviously isn't true for at least some special characters.

This refers to the fact that reverse() calls iri_to_uri(), but that function is primarily concerned with escaping non-ASCII characters. It's also idempotent, which means it won't escape any character that's legal in an URL.


I recommmend to change reverse() to urlquote its arguments, for the following reasons:

  • (2) above is the common case
  • The current behavior doesn't match the docs
  • The current behavior may result in invalid URLs (that still work in practic -- URL handling code is notoriously robust to malformed inputs!)
  • Contributors who looked at this ticket until now were mostly in favor of considering this a bug

It will be worth a note in the "backwards incompatible changes" because it seems very likely that some developers are working around the current, buggy behavior by urlquoting arguments to reverse, and they would get double-quoting.

comment:18 by Aymeric Augustin, 11 years ago

I've created a little experiment to showcase the bug:

# Put this in a file called experiment13260.py and run it with:
# django-admin.py runserver --settings=experiment13260
                           
from django.conf.urls import patterns, url
from django.http import HttpResponse
from django.template import Context, Template

DEBUG = True
ROOT_URLCONF = 'experiment13260'
SECRET_KEY = 'whatever'

TEMPLATE = Template("""<html>
<head>
    <title>Experiment with Django ticket #13260</title>
</head>
<body>
    <p>Put something with special characters in the URL!</p>
    <p>Argument passed to the view: <b>{{ arg }}</b></p>
    <p>Reversed URL: <b><a href="{% url 'view' arg %}">{% url 'view' arg %}</a></b></p>
</body>
</html>""")

urlpatterns = patterns('',
    url(r'^(.*)/?$', lambda req, arg: HttpResponse(TEMPLATE.render(Context({'arg': arg}))), name='view'),
)

If you go to http://localhost:8000/%252525/, every time you click the "Reversed URL" link, you lose a 25 in the URL.

comment:20 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 31b5275235bac150a54059db0288a19b9e0516c7:

Fixed #13260 -- Quoted arguments interpolated in URLs in reverse.

comment:21 by Tim Graham <timograham@…>, 10 years ago

In dfc092622e5e55081f9a76fddea752494c4505ba:

Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

comment:22 by Tim Graham <timograham@…>, 10 years ago

In a4c32d70c2bcf9731b6d6ff3370d2260ab4812af:

[1.6.x] Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

Backport of dfc092622e from master

Note: See TracTickets for help on using tickets.
Back to Top