Opened 4 years ago

Closed 4 years ago

#32050 closed Bug (duplicate)

reverse() and request.get_full_path() escape URL paths differently

Reported by: Jack Cushman Owned by: nobody
Component: Core (URLs) Version: dev
Severity: Normal Keywords: reverse, get_full_path, escape
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

reverse() and request.get_full_path() apply different escaping rules to the path section of URLs. reverse() applies iri_to_uri and escapes everything except /#%[]=:;$&()+,!?*@'~, while get_full_path applies escape_uri_path and escapes everything except /:@&+$,-_.!~*'(). (In other words, get_full_path() escapes some characters like semicolon that reverse() does not.) I'm not sure which rules are correct, but it seems like the two should escape URL paths the same way. In particular, maybe reverse() ought to apply escape_uri_path to its path section?

Here's a one-file app that demonstrates the mismatch. Visiting localhost:8000/1/wrongtitle will successfully redirect to the correct title, but visiting localhost:8000/2/wrongtitle will start an infinite loop printing /2/Some%20semicolon;.pdf does not match /2/Some%20semicolon%3B.pdf! because the title contains a semicolon, which is escaped by get_full_path() but not by reverse().

(I realize there are other ways to implement this demo program, but it highlights the underlying issue that the paths produced by the two functions should be escaped the same way, whichever way is correct.)

import sys

from django.conf import settings
from django.core.management import execute_from_command_line
from django.core.wsgi import get_wsgi_application
from django.http import HttpResponse, HttpResponseRedirect
from django.urls import path, reverse

settings.configure(ALLOWED_HOSTS=["*"], ROOT_URLCONF=__name__, SECRET_KEY="foo", DEBUG=True)

files = {1: 'Some title.pdf', 2: 'Some semicolon;.pdf'}

def deliver_file(request, file_id, title):
    correct_title = files[file_id]
    correct_url = reverse('deliver_file', args=[file_id, correct_title])
    if correct_url != request.get_full_path():
        print(f"{correct_url} does not match {request.get_full_path()}!")
        return HttpResponseRedirect(correct_url)
    return HttpResponse(correct_title)


urlpatterns = [
    path("<int:file_id>/<str:title>", deliver_file, name='deliver_file'),
]

app = get_wsgi_application()
execute_from_command_line(sys.argv)

Change History (1)

comment:1 by Carlton Gibson, 4 years ago

Resolution: duplicate
Status: newclosed

Thanks for the report.

This is delicate, at least. Adjusting the behaviour here leads to various test failures, so to that extent, current behaviour is at least expected.

Immediately, you can escape your URL parameters in the call to reverse() to avoid the issue:

In your script:

    correct_url = reverse('deliver_file', args=[file_id, escape_uri_path(correct_title)])

I think that this is going to be the required way to go.

See particularly the discussion on #18456 — the difference in behaviour between get_full_path() and uri_to_iri() is known, and it's not clear that get_full_path() can be implemented to handle all cases.

See also #11522, #13260, and #22223, which touch on this.

I'm going to close this as a duplicate of #18456. If you want to investigate fully and approach the DevelopersMailingList, if in conclusion you think there's a way to improve the behaviour all things considered then, that would be welcome if you're able to summarize the difficulties.

Note: See TracTickets for help on using tickets.
Back to Top