Opened 7 years ago
Closed 7 years ago
#28828 closed Cleanup/optimization (fixed)
Performance improvements for HttpRequest.build_absolute_uri()
Reported by: | gcbirzan | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Keryn Knight | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In cases where a lot of URLs are generated, the performance of it is suboptimal.
- It calls self.get_host() several times. That function does a lot of work that isn't needed more than once.
- It calls urljoin() even on the trivial case of absolute paths (which is the most common case). urljoin is quite expensive as well.
My patch fixes these by using a cached_property for the scheme://host part and by just concatenating the current scheme/host with the location when the location is absolute, doesn't change the host, and doesn't contain '.' or '..'. The last can be implemented in there, but that will just complicate the code I think, for again, a not very common case.
All the tests pass, but they did even when I wasn't checking for '.' and '..', so I added a test for that too.
While the improvements might be minor for some use cases, an example of this being slow can be found in this DRF ticket.
I've made a PR with these changes.
Change History (6)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|
comment:2 by , 7 years ago
Component: | Core (URLs) → HTTP handling |
---|---|
Summary: | Performance improvements for build_absolute_uri → Performance improvements for HttpRequest.build_absolute_uri() |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 7 years ago
Cc: | added |
---|
comment:5 by , 7 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Anecdata verifying the improvement, using the following code against Python 3.5.1 on OSX:
from django.test.client import RequestFactory import timeit request = RequestFactory().get('/') timeit.timeit("request.build_absolute_uri(location='///foo/bar/')", number=1000000, globals={'request': request}) request = RequestFactory().get('/') timeit.timeit("request.build_absolute_uri(location='/foo/./bar/')", number=1000000, globals={'request': request})
Master as of d60e8b856b49922deb85a168e48e56f16facd5df
consistently yields 50.x-51.x seconds for each timeit usage.
PR yields 9 seconds for the ///foo/bar/
case, and 33 seconds for the less common /foo/./bar/
case.
I've posted these in the PR comments, but, some benchmarks:
I ran this benchmark:
timeit.timeit("request.build_absolute_uri(location='///foo/bar/')", number=1000000, globals={'request': request})
The results were:
*Original version: 17.600296460092068
There is a regression. Running without the cached property, to simulate calling it in different requests, on a path with '.':
timeit.timeit("request.build_absolute_uri(location='/foo/./bar/')", number=1000000, globals={'request': request}))
The results:
So, the extra checks do add some overhead, but it's for an uncommon case. I don't have any evidence, obviously, but I'm fairly confident that the overwhelming uses build_absolute_url() are with an absolute path that doesn't have '.' or '..' in it and that the result is coming from reverse.