Opened 15 years ago

Closed 13 years ago

Last modified 11 years ago

#11877 closed Uncategorized (fixed)

Document that request.get_host() fails when behind multiple reverse proxies

Reported by: tomevans222 Owned by: nobody
Component: Documentation Version: 1.0
Severity: Normal Keywords: proxy forwarded get_host
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We run django behind several layers of proxies (please excuse ASCII art):

    Client
       |
       |
Apache (handling SSL connections, mod_proxy)
       |
       |
Apache (handling static content, load balancing, mod_proxy, mod_proxy_balancer)
    |                           | 
    |                           |
Apache (mod_fastcgi)       Apache (mod_fastcgi)
    |                           |
    |                           |
django instance             django instance

Each apache instance that proxies a request appends the ServerName of the proxy server into the X-Forwarded-For and X-Forwarded-Host headers. In our infrastructure, all of the apache instances respond to the same host name (which is preserved through the proxying using ProxyPreserveHost directive), so it ends up looking like this in request.META:

'HTTP_X_FORWARDED_FOR: '10.0.0.1, 10.0.0.2'
'HTTP_X_FORWARDED_HOST': 'portal.example.com, portal.example.com'
'HTTP_X_FORWARDED_SERVER': 'portal.example.com, portal.example.com'

This then breaks request.get_host(), which does the following:

   def get_host(self):
        """Returns the HTTP host using the environment or request headers."""
        # We try three options, in order of decreasing preference.
        if 'HTTP_X_FORWARDED_HOST' in self.META:
            host = self.META['HTTP_X_FORWARDED_HOST']
        elif 'HTTP_HOST' in self.META:
            host = self.META['HTTP_HOST']
        else:
            # Reconstruct the host using the algorithm from PEP 333.
            host = self.META['SERVER_NAME']
            server_port = str(self.META['SERVER_PORT'])
            if server_port != (self.is_secure() and '443' or '80'):
                host = '%s:%s' % (host, server_port)
        return host

IE, it completely ignores the possibility of more than one host in X-Forwarded-Host.

Breaking request.get_host() breaks request.build_absolute_uri(), which in turn breaks HttpResponseRedirect*, leading to redirects with location headers like:

  Location: http://portal.example.com, portal.example.com/foo/

I've not attached a patch, because I can think of a number of ways this could be fixed:

1) Change request parsing to notice multi value headers and correctly parse into arrays.
2) Change request.get_host() to look for multi value versions of X-Forwarded-Host and pull out the correct value.
3) Add separate middleware to rewrite request.META with just the 'closest' version of these headers.
... any number of other solutions.

This obviously is not a 'normal' deployment of django... We will probably work around with custom middleware for now, it is the least intrusive.

This is with 1.0, but the code for request.get_host() is the same in trunk.

Attachments (2)

proxy.py (1.3 KB ) - added by tomevans222 15 years ago.
Middleware to work around the issue
patch_11877.diff (1.2 KB ) - added by arnav 14 years ago.
patch with changes to documentation including example

Download all attachments as: .zip

Change History (8)

by tomevans222, 15 years ago

Attachment: proxy.py added

Middleware to work around the issue

comment:1 by Jacob, 14 years ago

Component: Core frameworkDocumentation
Summary: request.get_host() fails when behind multiple reverse proxiesDocument that request.get_host() fails when behind multiple reverse proxies
Triage Stage: UnreviewedAccepted

As you say, this is not a "normal" way of deploying Django, and I don't think there'll be a general solution here. See the back and forth about handling X-Forwarded-For for some background: the basic problem is that Django can't magically know which of a list of values is "correct" here. I think that your solution of handling this problem in middleware is as good as it's gonna get.

It would, however, be good to have this documented, so I'm hijacking this ticket and turning it into a doc request. We could add a note to the docs for request.get_host() and include your little bit of middleware as an example. Feel free to work up a doc patch if you wanna grease the wheels a bit.

comment:2 by tomevans222, 14 years ago

That sounds perfectly reasonable Jacob. I'll try to get a doc patch up as soon as $JOB allows :)

by arnav, 14 years ago

Attachment: patch_11877.diff added

patch with changes to documentation including example

comment:3 by arnav, 14 years ago

Has patch: set

documentation patch added

comment:4 by Gabriel Hurley, 13 years ago

Resolution: fixed
Status: newclosed

(In [14493]) Fixed #11877 -- Documented that HttpRequest.get_host() fails behind multiple reverse proxies, and added an example middleware solution. Thanks to Tom Evans for the report, and arnav for the patch.

comment:5 by Gabriel Hurley, 13 years ago

(In [14494]) [1.2.X] Fixed #11877 -- Documented that HttpRequest.get_host() fails behind multiple reverse proxies, and added an example middleware solution. Thanks to Tom Evans for the report, and arnav for the patch.

Backport of [14493] from trunk.

comment:6 by John Borwick <john_borwick@…>, 11 years ago

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

Note for other readers: when using the included MultipleProxyMiddleware, consider whether you want line 30 to pull the *last* element [-1] or the *first* element [0].

The right-most/last is supposed to be the most recent proxy (per <http://www.squid-cache.org/Doc/config/follow_x_forwarded_for/>). The left-most is the furthest upstream proxy.

Note: See TracTickets for help on using tickets.
Back to Top