Django

Code

Ticket #6621 (new)

Opened 7 months ago

Last modified 5 months ago

build_absolute_uri in HttpRequest breaks with colon in URL path

Reported by: Greg Thornton <xdissent@gmail.com> Assigned to: nobody
Milestone: Component: HTTP handling
Version: SVN Keywords: http url colon
Cc: hv@tbz-pariv.de Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

It appears that HttpRequest from django.http has a bug that allows it to return a relative URL when the path component of the current URL (or the location argument) contains a colon. The show_url view below demonstrates the issue:

from django.http import HttpResponse
from urlparse import urljoin
import re

http_re = re.compile('^https?:')

def get_absolute_uri(request):
    location = request.get_full_path()
    if not http_re.match(location): 
        current_uri = '%s://%s%s' % (request.is_secure() and 'https' or 'http', request.get_host(), request.path)
        location = urljoin(current_uri, location)
    return location

def show_url(request):
    response = """
        Absolute URI: %s<br />
        Expected: %s<br />
    """ % (request.build_absolute_uri(), get_absolute_uri(request))
    return HttpResponse(response)

Output correct without colon (requested http://xdissent.com/show_url/testing123):

Absolute URI: http://xdissent.com/show_url/testing123
Expected: http://xdissent.com/show_url/testing123

but broken when a colon is found in the URL path (requested http://xdissent.com/show_url/testing:123):

Absolute URI: /show_url/testing:123
Expected: http://xdissent.com/show_url/testing:123

My function get_absolute_url is essentially copied from HttpRequest's build_absolute_uri, but uses a small regex to determine whether or not a path is relative whereas build_absolute_uri simply checks for the presence of a colon. Upon first glance, it appears that this is correct behaviour according to the URL spec because really any colons in the path portion of a URL should be URL encoded as %3A so build_absolute_uri wouldn't find a colon when used properly. However, Django automatically decodes encoded URL characters so even when using %3A for colons, build_absolute_uri still thinks the path is absolute when it is not. For example:

Still broken even when using encoded colons in the path (requested http://xdissent.com/show_url/testing%3A123):

Absolute URI: /show_url/testing:123
Expected: http://xdissent.com/show_url/testing:123

Attachments

uri_fix.diff (0.9 kB) - added by Greg Thornton <xdissent@gmail.com> on 02/18/08 11:37:45.
patch for build_absolute_uri in HttpRequest to handle colons in URL paths
6621.diff (0.9 kB) - added by programmerq on 07/05/08 16:57:30.
Now applies cleanly to trunk

Change History

02/18/08 11:37:45 changed by Greg Thornton <xdissent@gmail.com>

  • attachment uri_fix.diff added.

patch for build_absolute_uri in HttpRequest to handle colons in URL paths

02/19/08 10:04:42 changed by guettli

  • needs_better_patch changed.
  • stage changed from Unreviewed to Accepted.
  • needs_tests changed.
  • needs_docs changed.

According to the documentation get_full_path() does not contain the protocol (see below).

That's way I think the check for ':' (as regex or not) in build_absolute_uri is wrong.

The url http://foo/view_url/http%3A... is valid, too.

But this means, the feature, that location is an absolute url must be removed. I think this is good, since build_abslute_uri() does nothing, if it guesses that location is absolute.

Code should refuse to guess.

Using a regex guesses better, but only in 99.999%, since location='http://foo' can be relative, too: http://foo/view_url/http%3A...

Documentation:

get_full_path():
Returns the path, plus an appended query string, if applicable.
Example: "/music/bands/the_beatles/?print=true"
build_absolute_uri(location):
...
If the location is already an absolute URI, it will not be altered.
Otherwise the absolute URI is built using the server variables available in this request.

02/25/08 17:00:50 changed by Andrew McNabb <amcnabb@mcnabbs.org>

I've run into the same bug. I think this patch looks good, and I would love to see it merged.

Thanks.

04/22/08 10:34:10 changed by anonymous

  • cc set to hv@tbz-pariv.de.

07/05/08 16:57:30 changed by programmerq

  • attachment 6621.diff added.

Now applies cleanly to trunk


Add/Change #6621 (build_absolute_uri in HttpRequest breaks with colon in URL path)




Change Properties
Action