Opened 3 weeks ago

Last modified 3 weeks ago

#28828 new Cleanup/optimization

Performance improvements for HttpRequest.build_absolute_uri()

Reported by: gcbirzan Owned by: nobody
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: Keryn Knight Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by gcbirzan)

In cases where a lot of URLs are generated, the performance of it is suboptimal.

  • It calls self.get_host() several times. That function does a lot of work that isn't needed more than once.
  • It calls urljoin() even on the trivial case of absolute paths (which is the most common case). urljoin is quite expensive as well.

My patch fixes these by using a cached_property for the scheme://host part and by just concatenating the current scheme/host with the location when the location is absolute, doesn't change the host, and doesn't contain '.' or '..'. The last can be implemented in there, but that will just complicate the code I think, for again, a not very common case.

All the tests pass, but they did even when I wasn't checking for '.' and '..', so I added a test for that too.

While the improvements might be minor for some use cases, an example of this being slow can be found in this DRF ticket.

I've made a PR with these changes.

Change History (4)

comment:1 Changed 3 weeks ago by gcbirzan

Description: modified (diff)

comment:2 Changed 3 weeks ago by Tim Graham

Component: Core (URLs)HTTP handling
Summary: Performance improvements for build_absolute_uriPerformance improvements for HttpRequest.build_absolute_uri()
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

comment:3 Changed 3 weeks ago by Keryn Knight

Cc: Keryn Knight added

comment:4 Changed 3 weeks ago by gcbirzan

I've posted these in the PR comments, but, some benchmarks:

I ran this benchmark:

timeit.timeit("request.build_absolute_uri(location='///foo/bar/')", number=1000000, globals={'request': request})

The results were:

  • With my fix: 4.474267777055502
  • With my fix but with bits.path[0] == '/': 4.34382488578558
  • My shortcut but no cached property: 9.473239112645388 (this is to simulate running this on different requests)
  • No shortcut but with cached property: 12.506602805107832
  • Original version: 17.600296460092068

There is a regression. Running without the cached property, to simulate calling it in different requests, on a path with '.':

timeit.timeit("request.build_absolute_uri(location='/foo/./bar/')", number=1000000, globals={'request': request}))

The results:

  • My version, no cached property: 19.713809736073017
  • Original version: 18.129451751708984

So, the extra checks do add some overhead, but it's for an uncommon case. I don't have any evidence, obviously, but I'm fairly confident that the overwhelming uses build_absolute_url() are with an absolute path that doesn't have '.' or '..' in it and that the result is coming from reverse.

Last edited 3 weeks ago by gcbirzan (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top