Opened 7 years ago

Closed 4 years ago

Last modified 4 years ago

#13260 closed Bug (fixed)

urlresolvers.reverse() generates invalid URLs when an argument contains % character

Reported by: Ilya Semenov Owned by: Aymeric Augustin
Component: Core (URLs) Version: master
Severity: Normal Keywords:
Cc: hv@…, ben@…, erikrose Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I call django.core.urlresolvers.reverse() where one of (string) args contains a percent character ("%"), it generates an invalid URL: the percent symbols are not replaced with %25. Interestingly, space characters are properly replaced with %20.

Consider the following example:

# urls.py

urlpatterns = patterns('',
    (r'^download/(.*)$', 'myapp.views.download'),
)

# views.py

def download(request, filename):
    pass

# somewhere else

filename = '100% completed.png'
print reverse('myapp.views.download', args=[filename]) # /download/100%%20completed.png - malformed URL, Apache replies with 400 Bad Request error code
print reverse('myapp.views.download', args=[urlquote(filename)]) # /download/100%25%20completed.png - correct URL

reverse() should call urlquote() itself - because there's never any point NOT to do that. Moreover, the current implementation forces a user to apply urlquote() for each and every string parameter, otherwise there's always a chance that an invalid URL would be generated.

Attachments (3)

t13260-test.diff (839 bytes) - added by Russell Keith-Magee 7 years ago.
Test case demonstrating problem
t13260-test2.diff (1.6 KB) - added by stumbles 7 years ago.
Additional tests to also cover % in keyword arguments
t13260-patch.diff (3.3 KB) - added by stumbles 7 years ago.
Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 7 years ago by Ilya Semenov

Component: UncategorizedCore framework

Changed 7 years ago by Russell Keith-Magee

Attachment: t13260-test.diff added

Test case demonstrating problem

comment:2 Changed 7 years ago by Russell Keith-Magee

Triage Stage: UnreviewedAccepted

comment:3 Changed 7 years ago by stumbles

Owner: changed from nobody to stumbles
Status: newassigned

Changed 7 years ago by stumbles

Attachment: t13260-test2.diff added

Additional tests to also cover % in keyword arguments

comment:4 Changed 7 years ago by stumbles

Has patch: set

Added a copy of russellm's regression test to also test kwargs.

Have resolved by applying urlquote to the args and kwargs values. Special characters "+$*/" are marked as safe to comply with existing regression tests.

This is my first Django patch, so any feedback would be appreciated.

comment:5 Changed 7 years ago by Ilya Semenov

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

comment:6 in reply to:  5 Changed 7 years ago by stumbles

Replying to semenov:

I believe this patch is not complete. The existing behavior replaces ' ' with '%20'. As you added an explicit call to urlquote(), that logic supposedly got redundant and had to be removed, but I don't see that in the patch.

The call to iri_to_uri() does replace some characters that are prohibited in URIs. You're correct, that there's some duplicate checking for these characters, but I don't think it can be removed.

Looking at this ticket again though, I don't think that it's right to supply any safe chars to my call to urlquote. What this would mean though is that either some of the tests are wrong or that it's the user's responsibility to only put valid characters in reverse's args and kwargs (meaning this isn't a bug). Will raise on django-dev.

comment:7 Changed 7 years ago by Thomas Güttler

Cc: hv@… added

Changed 7 years ago by stumbles

Attachment: t13260-patch.diff added

Additional tests including russellm's above plus one additional, code + test change assuming urlquote should be applied to args and kwargs.

comment:9 Changed 7 years ago by stumbles

Triage Stage: AcceptedDesign decision needed
Version: 1.1SVN

As to whether args and kwargs should be quoted (ie. whether this is a bug or is expected behaviour), there was one response each way to the post mentioned above.

The patch I've added assumes args and kwargs should be quoted (ie. bug). It modifies an existing test to fit this assumption. If you don't think this assumption is correct, please ignore the patch and close this ticket.

comment:10 Changed 7 years ago by stumbles

Cc: ben@… added

comment:11 Changed 6 years ago by erikrose

Cc: erikrose added

comment:12 Changed 6 years ago by Luke Plant

Type: Bug

comment:13 Changed 6 years ago by Luke Plant

Severity: Normal

comment:14 Changed 5 years ago by Aymeric Augustin

UI/UX: unset

Change UI/UX from NULL to False.

comment:15 Changed 5 years ago by Aymeric Augustin

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:16 Changed 4 years ago by Aymeric Augustin

Component: Core (Other)Core (URLs)

comment:17 Changed 4 years ago by Aymeric Augustin

Owner: changed from stumbles to Aymeric Augustin
Triage Stage: Design decision neededAccepted

Judging by the reference implementation, WSGI urlunquotes the path before putting it in environ['PATH_INFO'] where Django reads it.

That's where things get complicated :)

1) To round-trip properly through reverse / resolve, arguments must not be urlquoted by reverse — there's nothing to urlunquote them in this scenario.

2) To round-trip properly through reverse / render in template / click in browser / resolve, arguments must be urlquoted by reverse.


The docs for reverse say that "the string returned by reverse() is already urlquoted", which obviously isn't true for at least some special characters.

This refers to the fact that reverse() calls iri_to_uri(), but that function is primarily concerned with escaping non-ASCII characters. It's also idempotent, which means it won't escape any character that's legal in an URL.


I recommmend to change reverse() to urlquote its arguments, for the following reasons:

  • (2) above is the common case
  • The current behavior doesn't match the docs
  • The current behavior may result in invalid URLs (that still work in practic -- URL handling code is notoriously robust to malformed inputs!)
  • Contributors who looked at this ticket until now were mostly in favor of considering this a bug

It will be worth a note in the "backwards incompatible changes" because it seems very likely that some developers are working around the current, buggy behavior by urlquoting arguments to reverse, and they would get double-quoting.

comment:18 Changed 4 years ago by Aymeric Augustin

I've created a little experiment to showcase the bug:

# Put this in a file called experiment13260.py and run it with:
# django-admin.py runserver --settings=experiment13260
                           
from django.conf.urls import patterns, url
from django.http import HttpResponse
from django.template import Context, Template

DEBUG = True
ROOT_URLCONF = 'experiment13260'
SECRET_KEY = 'whatever'

TEMPLATE = Template("""<html>
<head>
    <title>Experiment with Django ticket #13260</title>
</head>
<body>
    <p>Put something with special characters in the URL!</p>
    <p>Argument passed to the view: <b>{{ arg }}</b></p>
    <p>Reversed URL: <b><a href="{% url 'view' arg %}">{% url 'view' arg %}</a></b></p>
</body>
</html>""")

urlpatterns = patterns('',
    url(r'^(.*)/?$', lambda req, arg: HttpResponse(TEMPLATE.render(Context({'arg': arg}))), name='view'),
)

If you go to http://localhost:8000/%252525/, every time you click the "Reversed URL" link, you lose a 25 in the URL.

comment:19 Changed 4 years ago by Aymeric Augustin

comment:20 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

Resolution: fixed
Status: assignedclosed

In 31b5275235bac150a54059db0288a19b9e0516c7:

Fixed #13260 -- Quoted arguments interpolated in URLs in reverse.

comment:21 Changed 4 years ago by Tim Graham <timograham@…>

In dfc092622e5e55081f9a76fddea752494c4505ba:

Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

comment:22 Changed 4 years ago by Tim Graham <timograham@…>

In a4c32d70c2bcf9731b6d6ff3370d2260ab4812af:

[1.6.x] Fixed #21529 -- Noted that {% url %} encodes its output (refs #13260).

Backport of dfc092622e from master

Note: See TracTickets for help on using tickets.
Back to Top