Opened 5 years ago

Closed 2 years ago

Last modified 16 months ago

#13260 closed Bug (fixed)

urlresolvers.reverse() generates invalid URLs when an argument contains % character

Reported by: semenov Owned by: aaugustin
Component: Core (URLs) Version: master
Severity: Normal Keywords:
Cc: hv@…, ben@…, erikrose Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I call django.core.urlresolvers.reverse() where one of (string) args contains a percent character ("%"), it generates an invalid URL: the percent symbols are not replaced with %25. Interestingly, space characters are properly replaced with %20.

Consider the following example:

# urls.py

urlpatterns = patterns('',
    (r'^download/(.*)$', 'myapp.views.download'),
)

# views.py

def download(request, filename):
    pass

# somewhere else

filename = '100% completed.png'
print reverse('myapp.views.download', args=[filename]) # /download/100%%20completed.png - malformed URL, Apache replies with 400 Bad Request error code
print reverse('myapp.views.download', args=[urlquote(filename)]) # /download/100%25%20completed.png - correct URL

reverse() should call urlquote() itself - because there's never any point NOT to do that. Moreover, the current implementation forces a user to apply urlquote() for each and every string parameter, otherwise there's always a chance that an invalid URL would be generated.

Attachments (3)

t13260-test.diff (839 bytes) - added by russellm 5 years ago.
Test case demonstrating problem
t13260-test2.diff (1.6 KB) - added by stumbles 5 years ago.
Additional tests to also cover % in keyword arguments
t13260-patch.diff (3.3 KB) - added by stumbles 5 years ago.
Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 5 years ago by semenov

  • Component changed from Uncategorized to Core framework
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Changed 5 years ago by russellm

Test case demonstrating problem

comment:2 Changed 5 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

comment:3 Changed 5 years ago by stumbles

  • Owner changed from nobody to stumbles
  • Status changed from new to assigned

Changed 5 years ago by stumbles

Additional tests to also cover % in keyword arguments

comment:4 Changed 5 years ago by stumbles

  • Has patch set

Added a copy of russellm's regression test to also test kwargs.

Have resolved by applying urlquote to the args and kwargs values. Special characters "+$*/" are marked as safe to comply with existing regression tests.

This is my first Django patch, so any feedback would be appreciated.

comment:5 follow-up: Changed 5 years ago by semenov

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

comment:6 in reply to: ↑ 5 Changed 5 years ago by stumbles

Replying to semenov:

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

The call to iri_to_uri() does replace some characters that are prohibited in URIs. You're correct, that there's some duplicate checking for these characters, but I don't think it can be removed.

Looking at this ticket again though, I don't think that it's right to supply any safe chars to my call to urlquote. What this would mean though is that either some of the tests are wrong or that it's the user's responsibility to only put valid characters in reverse's args and kwargs (meaning this isn't a bug). Will raise on django-dev.

comment:7 Changed 5 years ago by guettli

  • Cc hv@… added

Changed 5 years ago by stumbles

Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

comment:9 Changed 5 years ago by stumbles

  • Triage Stage changed from Accepted to Design decision needed
  • Version changed from 1.1 to SVN

As to whether args and kwargs should be quoted (ie. whether this is a bug or is expected behaviour), there was one response each way to the post mentioned above.

The patch I've added assumes args and kwargs should be quoted (ie. bug). It modifies an existing test to fit this assumption. If you don't think this assumption is correct, please ignore the patch and close this ticket.

comment:10 Changed 5 years ago by stumbles

  • Cc ben@… added

comment:11 Changed 4 years ago by erikrose

  • Cc erikrose added

comment:12 Changed 4 years ago by lukeplant

  • Type set to Bug

comment:13 Changed 4 years ago by lukeplant

  • Severity set to Normal

comment:14 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:15 Changed 3 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:16 Changed 2 years ago by aaugustin

  • Component changed from Core (Other) to Core (URLs)

comment:17 Changed 2 years ago by aaugustin

  • Owner changed from stumbles to aaugustin
  • Triage Stage changed from Design decision needed to Accepted

Judging by the reference implementation, WSGI urlunquotes the path before putting it in environ['PATH_INFO'] where Django reads it.

That's where things get complicated :)

1) To round-trip properly through reverse / resolve, arguments must not be urlquoted by reverse — there's nothing to urlunquote them in this scenario.

2) To round-trip properly through reverse / render in template / click in browser / resolve, arguments must be urlquoted by reverse.


The docs for reverse say that "the string returned by reverse() is already urlquoted", which obviously isn't true for at least some special characters.

This refers to the fact that reverse() calls iri_to_uri(), but that function is primarily concerned with escaping non-ASCII characters. It's also idempotent, which means it won't escape any character that's legal in an URL.


I recommmend to change reverse() to urlquote its arguments, for the following reasons:

  • (2) above is the common case
  • The current behavior doesn't match the docs
  • The current behavior may result in invalid URLs (that still work in practic -- URL handling code is notoriously robust to malformed inputs!)
  • Contributors who looked at this ticket until now were mostly in favor of considering this a bug

It will be worth a note in the "backwards incompatible changes" because it seems very likely that some developers are working around the current, buggy behavior by urlquoting arguments to reverse, and they would get double-quoting.

comment:18 Changed 2 years ago by aaugustin

I've created a little experiment to showcase the bug:

# Put this in a file called experiment13260.py and run it with:
# django-admin.py runserver --settings=experiment13260
                           
from django.conf.urls import patterns, url
from django.http import HttpResponse
from django.template import Context, Template

DEBUG = True
ROOT_URLCONF = 'experiment13260'
SECRET_KEY = 'whatever'

TEMPLATE = Template("""<html>
<head>
    <title>Experiment with Django ticket #13260</title>
</head>
<body>
    <p>Put something with special characters in the URL!</p>
    <p>Argument passed to the view: <b>{{ arg }}</b></p>
    <p>Reversed URL: <b><a href="{% url 'view' arg %}">{% url 'view' arg %}</a></b></p>
</body>
</html>""")

urlpatterns = patterns('',
    url(r'^(.*)/?$', lambda req, arg: HttpResponse(TEMPLATE.render(Context({'arg': arg}))), name='view'),
)

If you go to http://localhost:8000/%252525/, every time you click the "Reversed URL" link, you lose a 25 in the URL.

comment:20 Changed 2 years ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 31b5275235bac150a54059db0288a19b9e0516c7:

Fixed #13260 -- Quoted arguments interpolated in URLs in reverse.

comment:21 Changed 16 months ago by Tim Graham <timograham@…>

In dfc092622e5e55081f9a76fddea752494c4505ba:

Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

comment:22 Changed 16 months ago by Tim Graham <timograham@…>

In a4c32d70c2bcf9731b6d6ff3370d2260ab4812af:

[1.6.x] Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

Backport of dfc092622e from master

Note: See TracTickets for help on using tickets.
Back to Top