Opened 2 years ago

Closed 2 years ago

#33546 closed Cleanup/optimization (fixed)

Optimise creation of HttpResponseBase and ResponseHeaders objects

Reported by: Keryn Knight Owned by: Keryn Knight
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Nick Pope Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is a series of small changes across HttpResponseBase and ResponseHeaders to decrease the time spent creating an HttpResponse based on common calling conventions. As it's a few little tweaks, I'm frontloading the overall times and will explain the individual pieces thereafter.

main patched
ResponseHeaders({}) 745 ns 313 ns
ResponseHeaders({'TEST': "1"}) 2.17 µs 1.8 µs
ResponseHeaders({'TEST': "1", "TEST2": "2"}) 3.51 µs 2.77 µs
ResponseHeaders({"Content-Type": "application/json"}) 2.19 µs 1.93 µs
HttpResponseBase() 7.89 µs 4.18 µs
HttpResponseBase(headers={"TEST": "1"}) 9.75 µs 6.02 µs
HttpResponseBase(headers={"TEST": "1", "TEST2": "2"}) 11 µs 7.09 µs
HttpResponseBase(headers={"Content-Type": "application/json"}) 3.71 µs 3.26 µs
HttpResponseBase(content_type="test") 4.48 µs 3.03 µs
HttpResponse() 9.06 µs 5.46 µs
HttpResponse(headers={"TEST": "1"}) 11.4 µs 7.58 µs
HttpResponse(headers={"TEST": "1", "TEST2": "2"}) 13 µs 8.83 µs
HttpResponse(headers={"Content-Type": "application/json"}) 5.22 µs 4.82 µs
HttpResponse(content_type="test") 6.2 µs 4.38 µs

So by my estimation of things, given most responses don't contain that many headers (if any), these are all happy-path positives.

The changes themselves, then.

The first is to only call the ResponseHeaders._unpack_items if data is given to it, and to update the HttpResponseBase to pass the headers value straight through (which is None by default):

-        for header, value in self._unpack_items(data):
-            self[header] = value
+        if data:
+            for header, value in self._unpack_items(data):
+                self[header] = value

&

-        self.headers = ResponseHeaders(headers or {})
+        self.headers = ResponseHeaders(headers)

That saves a few hundred nanoseconds when none are given, but adds about 50 when some are given.
So the next step is to try and claw back those 50 nanoseconds elsewhere. I won't paste the full diff here because it's more moving around than the above, but it basically involves optimizing/re-working the _convert_to_charset method to do fewer isinstance/type-checking calls and deferring the BadHeaderError check until we've got an actual string, in the happy path. Nick (ngnpope on GitHub) independently came up with a similar/the same(?) idea when we were discussing #33532 (see this comment) in a slightly different manner, so would be listed in the commit as Co-Author.

The changes to _convert_to_charset claw back the lost time well enough; let x be pre-declared as ResponseHeaders({}):

main patched
x['test'] = 'test' 1.34 µs 897 ns
x._convert_to_charset('test', 'ascii') 500 ns 291 ns
x._convert_to_charset('test', 'latin-1', mime_encode=True) 565 ns 359 ns

The next change to stack up is in the HttpResponseBase.charset property, which is called when neither a content_type parameter nor a Content-Type header key is passed in. That method actually asks for the Content-Type Header, falling back to a blank string if not found (which it won't be, because it's currently being set) - but in asking for the header, doesn't test whether it was truthy, so always runs the regex search on an empty string. A bit of reworking and a walrus operator or two looks like:

-        content_type = self.get("Content-Type", "")
-        matched = _charset_from_content_type_re.search(content_type)
-        if matched:
+        if content_type := self.headers.get("Content-Type", ""):
+            if matched := _charset_from_content_type_re.search(content_type):

Finally, rejig the HttpResponseBase.__init__ to do one fewer __contains__/__getitem__ on the ResponseHeaders in case it needs to raise a ValueError about content_type and Content-Type clashing, at least if I've thought through the boolean logic conditions correctly:

-        if content_type and "Content-Type" in self.headers:
-            ... raise error
-        if "Content-Type" not in self.headers:
-            ... set content type
+        if "Content-Type" not in self.headers:
+            ... set content type
+        elif content_type:
+            ... raise error

And that's your lot. Technically you can go further with _convert_to_charset in the unhappy path where it produces a UnicodeError and then attempts to encode it with email.Header, and defer the newlines/BadHeaderError check until after that encoding ... But it would change the exception in certain circumstances from a BadHeaderError from UnicodeError into just a UnicodeError which may not be desirable, and it also means knowing how email.Header handles newlines to grok the safety of the whole change, so I've not gone ahead with it.

Pull request for discussion of the various changes will be forthcoming once I get a ticket number and rebase.

Tests all pass for me locally (SQLite, Python 3.10, OSX)

Change History (9)

comment:1 by Keryn Knight, 2 years ago

Has patch: set

PR is https://github.com/django/django/pull/15466 ... all tests passed. Remarkable, not even a linter to trip me!

comment:2 by Mariusz Felisiak, 2 years ago

Triage Stage: UnreviewedAccepted

comment:3 by Mariusz Felisiak, 2 years ago

Cc: Nick Pope added

comment:4 by Mariusz Felisiak, 2 years ago

Triage Stage: AcceptedReady for checkin

comment:5 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In e0b197c:

Refs #33546 -- Avoided unpacking data in ResponseHeaders when not necessary.

comment:6 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 51f896fe:

Refs #33546 -- Optimized ResponseHeaders._convert_to_charset() by reducing the type-checking duplication.

In the common case, where keys and values are be encoded into
ascii/latin-1, defer the checking for newlines until it's been
successfully coerced to a string.

Co-authored-by: Nick Pope <nick@…>

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 4b2f6ace:

Refs #33546 -- Optimized HttpResponseBase.charset a bit.

This avoids scanning the Content-Type if it's empty, allowing the
Content-Type header itself to have a charset assigned without using
the re module.

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 95b7d01d:

Refs #33546 -- Optimized handling content types in HttpResponseBase.init().

This removes an extraneous conditional causing "Content-Type" to be
checked within the ResponseHeaders twice, if a content_type parameter
is provided.

comment:9 by Mariusz Felisiak, 2 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.
Back to Top