Opened 8 years ago
Last modified 4 years ago
#27575 new Cleanup/optimization
Make host validation run on all requests
Reported by: | JorisBenschop | Owned by: | |
---|---|---|---|
Component: | HTTP handling | Version: | 1.10 |
Severity: | Normal | Keywords: | |
Cc: | desecho@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
please ignore all this if its stupid, but I was trying to create a 1.10 middleware and learn from existing code. Looking at
django.middleware.common.CommonMiddleware I saw this in process_request
host = request.get_host() must_prepend = settings.PREPEND_WWW and host and not host.startswith('www.') redirect_url = ('%s://www.%s' % (request.scheme, host)) if must_prepend else ''
If I understand correctly, this code is ran for each request. Would we not get more performance by setting
redirect_url = '' if settings.PREPEND_WWW: host = request.get_host() if host and not host.startswith('www.'): redirect_url = ('%s://www.%s' % (request.scheme, host))
as this evaluates most of the code only when settings.PREPEND_WWW is true.
Again, maybe I completely misunderstand how python code optimization works. But I hope it helps
Change History (15)
comment:1 by , 8 years ago
Description: | modified (diff) |
---|
comment:2 by , 8 years ago
comment:3 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:4 by , 8 years ago
Hi Simon,
Thanks for this clarification.
Looking at the code, to my eye the get_host() only returns it output to a variable that is only used once, but now i understand there is some implicit 'magic' ran behind the screens as well. Indeed that would make the patch a lot less useful, but then would it maybe be worth annotating this hidden behaviour in a comment within the def instead?
Thanks for your gelp
Joris
comment:5 by , 8 years ago
Has patch: | unset |
---|
comment:6 by , 8 years ago
Cc: | added |
---|
We could add a comment in process_request method
host = request.get_host() # Also perform host validation
And probably in get_host method as well
"""Return the HTTP host using the environment or request headers. Perform host validation."""
comment:7 by , 8 years ago
Wouldnt refactoring this function (validate_host?) make sense if it does a lot more than retrieving a hostname?
comment:9 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:10 by , 8 years ago
@Simon Charette
So the final changes that need to be made are:
- Rename
get_host()
toget_host_and_validate()
- Handle occurrences of
get_host
everywhere in the docs and change it to the new name - Mention the change in the changelist (will need help here regarding where to put this change exactly)
Please confirm once and I shall make a PR
comment:11 by , 8 years ago
I don't think there's value in renaming the function at this point. In fact, it might be better to move that validation elsewhere so it always runs during a request, regardless of if get_host()
is called. There was some discussion about this at the time host header validation was added but I don't have time to look into this more right now.
comment:12 by , 8 years ago
Component: | Core (Other) → HTTP handling |
---|---|
Easy pickings: | unset |
Summary: | potential commonmiddleware optimization → Make host validation run on all requests |
comment:13 by , 4 years ago
In fact, it might be better to move that validation elsewhere so it always runs during a request, regardless of if get_host() is called.
Doing this might be complicated by the fact that, currently, host validation has to take place after custom middleware in certain situations, per this note in the HttpRequest.get_host()
docs:
https://docs.djangoproject.com/en/3.1/ref/request-response/#django.http.HttpRequest.get_host
An easier option might be to cache get_host()
's return value. Then middleware that might need the return value could call get_host()
early on to warm the cache (or fail the request) without too much penalty (since get_host()
can be getting called elsewhere, anyways).
comment:14 by , 4 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
Tim, what were you envisioning for this ticket? In particular, were you envisioning a backwards incompatible change?
Also, something weaker than requiring it to be run for all Django installs for all requests would be to ensure that it runs for all requests only when any of certain middleware is enabled. That initial list of middleware could include e.g. any middleware that currently calls HttpRequest.get_host()
under certain conditions (e.g. CsrfViewMiddleware
). That approach would at least let middleware call it without needing to handle DisallowedHost
everywhere, because those middleware could each be updated to call it at the beginning if it hasn't been called yet (and then caching the value for later middleware).
comment:15 by , 4 years ago
I didn't have anything specific in mind. It seems that running host validation before middleware isn't feasible due to the use case you pointed out. The other option would be to run it afterward (just in case no middleware already did, which I'm not sure is a likely possibility, but that was a concern with the optimization originally proposed here in CommonMiddleware
).
Hi Joris,
I'd say it's worth investigating but one thing that you'll need to make sure is that
get_host()
is called inconditionally as this method has the side effect of performingALLOWED_HOSTS
validation.