Opened 6 years ago

Closed 6 years ago

#28828 closed Cleanup/optimization (fixed)

Performance improvements for HttpRequest.build_absolute_uri()

Reported by: gcbirzan Owned by: nobody
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Keryn Knight Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by gcbirzan)

In cases where a lot of URLs are generated, the performance of it is suboptimal.

  • It calls self.get_host() several times. That function does a lot of work that isn't needed more than once.
  • It calls urljoin() even on the trivial case of absolute paths (which is the most common case). urljoin is quite expensive as well.

My patch fixes these by using a cached_property for the scheme://host part and by just concatenating the current scheme/host with the location when the location is absolute, doesn't change the host, and doesn't contain '.' or '..'. The last can be implemented in there, but that will just complicate the code I think, for again, a not very common case.

All the tests pass, but they did even when I wasn't checking for '.' and '..', so I added a test for that too.

While the improvements might be minor for some use cases, an example of this being slow can be found in this DRF ticket.

I've made a PR with these changes.

Change History (6)

comment:1 by gcbirzan, 6 years ago

Description: modified (diff)

comment:2 by Tim Graham, 6 years ago

Component: Core (URLs)HTTP handling
Summary: Performance improvements for build_absolute_uriPerformance improvements for HttpRequest.build_absolute_uri()
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

comment:3 by Keryn Knight, 6 years ago

Cc: Keryn Knight added

comment:4 by gcbirzan, 6 years ago

I've posted these in the PR comments, but, some benchmarks:

I ran this benchmark:

timeit.timeit("request.build_absolute_uri(location='///foo/bar/')", number=1000000, globals={'request': request})

The results were:

  • With my fix: 4.474267777055502
  • With my fix but with bits.path[0] == '/': 4.34382488578558
  • My shortcut but no cached property: 9.473239112645388 (this is to simulate running this on different requests)
  • No shortcut but with cached property: 12.506602805107832
  • Original version: 17.600296460092068

There is a regression. Running without the cached property, to simulate calling it in different requests, on a path with '.':

timeit.timeit("request.build_absolute_uri(location='/foo/./bar/')", number=1000000, globals={'request': request}))

The results:

  • My version, no cached property: 19.713809736073017
  • Original version: 18.129451751708984

So, the extra checks do add some overhead, but it's for an uncommon case. I don't have any evidence, obviously, but I'm fairly confident that the overwhelming uses build_absolute_url() are with an absolute path that doesn't have '.' or '..' in it and that the result is coming from reverse.

Last edited 6 years ago by gcbirzan (previous) (diff)

comment:5 by Keryn Knight, 6 years ago

Triage Stage: AcceptedReady for checkin

Anecdata verifying the improvement, using the following code against Python 3.5.1 on OSX:

from django.test.client import RequestFactory
import timeit
request = RequestFactory().get('/')
timeit.timeit("request.build_absolute_uri(location='///foo/bar/')", number=1000000, globals={'request': request})
request = RequestFactory().get('/')
timeit.timeit("request.build_absolute_uri(location='/foo/./bar/')", number=1000000, globals={'request': request})

Master as of d60e8b856b49922deb85a168e48e56f16facd5df consistently yields 50.x-51.x seconds for each timeit usage.
PR yields 9 seconds for the ///foo/bar/ case, and 33 seconds for the less common /foo/./bar/ case.

comment:6 by Tim Graham <timograham@…>, 6 years ago

Resolution: fixed
Status: newclosed

In 5bf62825:

Fixed #28828 -- Improved performance of HttpRequest.build_absolute_uri().

Note: See TracTickets for help on using tickets.
Back to Top