#13260 closed Bug (fixed)

urlresolvers.reverse() generates invalid URLs when an argument contains % character

If I call django.core.urlresolvers.reverse() where one of (string) args contains a percent character ("%"), it generates an invalid URL: the percent symbols are not replaced with %25. Interestingly, space characters are properly replaced with %20.

Consider the following example:


urlpatterns = patterns('',
    (r'^download/(.*)$', ''),


def download(request, filename):

# somewhere else

filename = '100% completed.png'
print reverse('', args=[filename]) # /download/100%%20completed.png - malformed URL, Apache replies with 400 Bad Request error code
print reverse('', args=[urlquote(filename)]) # /download/100%25%20completed.png - correct URL

reverse() should call urlquote() itself - because there's never any point NOT to do that. Moreover, the current implementation forces a user to apply urlquote() for each and every string parameter, otherwise there's always a chance that an invalid URL would be generated.

Change History (25)

Test case demonstrating problem

Additional tests to also cover % in keyword arguments

Added a copy of russellm's regression test to also test kwargs.

Have resolved by applying urlquote to the args and kwargs values. Special characters "+$*/" are marked as safe to comply with existing regression tests.

This is my first Django patch, so any feedback would be appreciated.

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

The call to iri_to_uri() does replace some characters that are prohibited in URIs. You're correct, that there's some duplicate checking for these characters, but I don't think it can be removed.

Looking at this ticket again though, I don't think that it's right to supply any safe chars to my call to urlquote. What this would mean though is that either some of the tests are wrong or that it's the user's responsibility to only put valid characters in reverse's args and kwargs (meaning this isn't a bug). Will raise on django-dev.

Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

As to whether args and kwargs should be quoted (ie. whether this is a bug or is expected behaviour), there was one response each way to the post mentioned above.

The patch I've added assumes args and kwargs should be quoted (ie. bug). It modifies an existing test to fit this assumption. If you don't think this assumption is correct, please ignore the patch and close this ticket.

Judging by the reference implementation, WSGI urlunquotes the path before putting it in environ['PATH_INFO'] where Django reads it.

That's where things get complicated :)

1) To round-trip properly through reverse / resolve, arguments must not be urlquoted by reverse — there's nothing to urlunquote them in this scenario.

2) To round-trip properly through reverse / render in template / click in browser / resolve, arguments must be urlquoted by reverse.

The docs for reverse say that "the string returned by reverse() is already urlquoted", which obviously isn't true for at least some special characters.

This refers to the fact that reverse() calls iri_to_uri(), but that function is primarily concerned with escaping non-ASCII characters. It's also idempotent, which means it won't escape any character that's legal in an URL.

I recommmend to change reverse() to urlquote its arguments, for the following reasons:

  • (2) above is the common case
  • The current behavior doesn't match the docs
  • The current behavior may result in invalid URLs (that still work in practic -- URL handling code is notoriously robust to malformed inputs!)
  • Contributors who looked at this ticket until now were mostly in favor of considering this a bug

It will be worth a note in the "backwards incompatible changes" because it seems very likely that some developers are working around the current, buggy behavior by urlquoting arguments to reverse, and they would get double-quoting.

I've created a little experiment to showcase the bug:

# Put this in a file called and run it with:
# runserver --settings=experiment13260
from django.conf.urls import patterns, url
from django.http import HttpResponse
from django.template import Context, Template

DEBUG = True
ROOT_URLCONF = 'experiment13260'
SECRET_KEY = 'whatever'

TEMPLATE = Template("""<html>
    <title>Experiment with Django ticket #13260</title>
    <p>Put something with special characters in the URL!</p>
    <p>Argument passed to the view: <b>{{ arg }}</b></p>
    <p>Reversed URL: <b><a href="{% url 'view' arg %}">{% url 'view' arg %}</a></b></p>

urlpatterns = patterns('',
    url(r'^(.*)/?$', lambda req, arg: HttpResponse(TEMPLATE.render(Context({'arg': arg}))), name='view'),

If you go to http://localhost:8000/%252525/, every time you click the "Reversed URL" link, you lose a 25 in the URL.

Fixed #13260 -- Quoted arguments interpolated in URLs in reverse.

In dfc092622e5e55081f9a76fddea752494c4505ba:

Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

In a4c32d70c2bcf9731b6d6ff3370d2260ab4812af:

[1.6.x] Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

Backport of dfc092622e from master

