Opened 7 years ago

Last modified 3 years ago

#27575 new Cleanup/optimization

Make host validation run on all requests

Reported by: JorisBenschop Owned by:
Component: HTTP handling Version: 1.10
Severity: Normal Keywords:
Cc: desecho@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by JorisBenschop)

please ignore all this if its stupid, but I was trying to create a 1.10 middleware and learn from existing code. Looking at
django.middleware.common.CommonMiddleware I saw this in process_request

host = request.get_host()
must_prepend = settings.PREPEND_WWW and host and not host.startswith('www.')
redirect_url = ('%s://www.%s' % (request.scheme, host)) if must_prepend else ''

If I understand correctly, this code is ran for each request. Would we not get more performance by setting

redirect_url = ''
if settings.PREPEND_WWW:
    host = request.get_host()
    if host and not host.startswith('www.'):
        redirect_url = ('%s://www.%s' % (request.scheme, host)) 

as this evaluates most of the code only when settings.PREPEND_WWW is true.

Again, maybe I completely misunderstand how python code optimization works. But I hope it helps

Change History (15)

comment:1 by JorisBenschop, 7 years ago

Description: modified (diff)

comment:2 by Simon Charette, 7 years ago

Hi Joris,

I'd say it's worth investigating but one thing that you'll need to make sure is that get_host() is called inconditionally as this method has the side effect of performing ALLOWED_HOSTS validation.

comment:3 by Simon Charette, 7 years ago

Triage Stage: UnreviewedAccepted

comment:4 by JorisBenschop, 7 years ago

Hi Simon,
Thanks for this clarification.
Looking at the code, to my eye the get_host() only returns it output to a variable that is only used once, but now i understand there is some implicit 'magic' ran behind the screens as well. Indeed that would make the patch a lot less useful, but then would it maybe be worth annotating this hidden behaviour in a comment within the def instead?

Thanks for your gelp
Joris

comment:5 by Tim Graham, 7 years ago

Has patch: unset

comment:6 by Anton Samarchyan, 7 years ago

Cc: desecho@… added

We could add a comment in process_request method

host = request.get_host()  # Also perform host validation

And probably in get_host method as well

"""Return the HTTP host using the environment or request headers. Perform host validation."""

comment:7 by JorisBenschop, 7 years ago

Wouldnt refactoring this function (validate_host?) make sense if it does a lot more than retrieving a hostname?

comment:8 by Anton Samarchyan, 7 years ago

A function get_host could be renamed to get_host_and_validate

comment:9 by Ketan Bhatt, 7 years ago

Owner: changed from nobody to Ketan Bhatt
Status: newassigned

comment:10 by Ketan Bhatt, 7 years ago

@Simon Charette

So the final changes that need to be made are:

  1. Rename get_host() to get_host_and_validate()
  2. Handle occurrences of get_host everywhere in the docs and change it to the new name
  3. Mention the change in the changelist (will need help here regarding where to put this change exactly)

Please confirm once and I shall make a PR

comment:11 by Tim Graham, 7 years ago

I don't think there's value in renaming the function at this point. In fact, it might be better to move that validation elsewhere so it always runs during a request, regardless of if get_host() is called. There was some discussion about this at the time host header validation was added but I don't have time to look into this more right now.

comment:12 by Tim Graham, 7 years ago

Component: Core (Other)HTTP handling
Easy pickings: unset
Summary: potential commonmiddleware optimizationMake host validation run on all requests

comment:13 by Chris Jerdonek, 3 years ago

In fact, it might be better to move that validation elsewhere so it always runs during a request, regardless of if get_host() is called.

Doing this might be complicated by the fact that, currently, host validation has to take place after custom middleware in certain situations, per this note in the HttpRequest.get_host() docs:
https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.get_host

An easier option might be to cache get_host()'s return value. Then middleware that might need the return value could call get_host() early on to warm the cache (or fail the request) without too much penalty (since get_host() can be getting called elsewhere, anyways).

comment:14 by Chris Jerdonek, 3 years ago

Owner: Ketan Bhatt removed
Status: assignednew

Tim, what were you envisioning for this ticket? In particular, were you envisioning a backwards incompatible change?

Also, something weaker than requiring it to be run for all Django installs for all requests would be to ensure that it runs for all requests only when any of certain middleware is enabled. That initial list of middleware could include e.g. any middleware that currently calls HttpRequest.get_host() under certain conditions (e.g. CsrfViewMiddleware). That approach would at least let middleware call it without needing to handle DisallowedHost everywhere, because those middleware could each be updated to call it at the beginning if it hasn't been called yet (and then caching the value for later middleware).

comment:15 by Tim Graham, 3 years ago

I didn't have anything specific in mind. It seems that running host validation before middleware isn't feasible due to the use case you pointed out. The other option would be to run it afterward (just in case no middleware already did, which I'm not sure is a likely possibility, but that was a concern with the optimization originally proposed here in CommonMiddleware).

Note: See TracTickets for help on using tickets.
Back to Top