Opened 2 years ago

Closed 2 years ago

#33532 closed Cleanup/optimization (fixed)

Micro-optimisation: Special-case dictionaries for CaseInsensitiveMapping.

Reported by: Keryn Knight Owned by: Keryn Knight
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, all data given to HttpHeaders and ResponseHeaders go through CaseInsensitiveMapping._unpack_items to unwind in case it's not a dictionary.

In the case of HttpHeaders though, it's always a dictionary (and I've got a plan for ResponseHeaders in general, as a separate patch, should this one go OK as a pre-requisite), so for the common cases of CaseInsensitiveMapping being used, we can optimise for the expected use-case by introducing an additional isinstance check to avoid going through _unpack_items, __instancecheck__ and _abc._abc_instancecheck

The change itself (I'll add a PR after I get a ticket number) is thus:

-        self._store = {k.lower(): (k, v) for k, v in self._unpack_items(data)}
+        data = data.items() if isinstance(data, dict) else self._unpack_items(data)
+        self._store = {k.lower(): (k, v) for k, v in data}

The isinstance check will add roughly 70ns (for me) to non-dict usages of CaseInsensitiveMapping (and it's subclasses), but will save
roughly 500ns (for me) when using dictionaries.

Baseline, main as of b626c5a9798b045b655d085d59efdd60b5d7a0e3:

In [1]: from django.http.request import HttpHeaders

In [2]: from django.utils.datastructures import CaseInsensitiveMapping

In [3]: %timeit CaseInsensitiveMapping({'Name': 'Jane'})
1.4 µs ± 7.94 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit HttpHeaders({"HTTP_EXAMPLE": 1})
2.79 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit HttpHeaders({"CONTENT_LENGTH": 1})
2.56 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %prun for _ in range(100000): CaseInsensitiveMapping({"Name": "Jane"})
   900009 function calls (900007 primitive calls) in 0.356 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.073    0.000    0.231    0.000 datastructures.py:306(<dictcomp>)
   200000    0.066    0.000    0.147    0.000 datastructures.py:328(_unpack_items)
        1    0.063    0.063    0.356    0.356 <string>:1(<module>)
   100000    0.063    0.000    0.293    0.000 datastructures.py:305(__init__)
   100000    0.030    0.000    0.072    0.000 {built-in method builtins.isinstance}
   100000    0.022    0.000    0.041    0.000 abc.py:117(__instancecheck__)
   100000    0.019    0.000    0.020    0.000 {built-in method _abc._abc_instancecheck}
   100000    0.011    0.000    0.011    0.000 {method 'lower' of 'str' objects}
   100000    0.010    0.000    0.010    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.356    0.356 {built-in method builtins.exec}
      2/1    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}
      2/1    0.000    0.000    0.000    0.000 abc.py:121(__subclasscheck__)
        2    0.000    0.000    0.000    0.000 _collections_abc.py:409(__subclasshook__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

After adding the patch suggested above:

In [1]: from django.http.request import HttpHeaders

In [2]: from django.utils.datastructures import CaseInsensitiveMapping

In [3]: %timeit CaseInsensitiveMapping({'Name': 'Jane'})
862 ns ± 9.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [4]: %timeit HttpHeaders({"HTTP_EXAMPLE": 1})
2.05 µs ± 31 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit HttpHeaders({"CONTENT_LENGTH": 1})
1.98 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %prun for _ in range(100000): CaseInsensitiveMapping({"Name": "Jane"})
   500003 function calls in 0.204 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.084    0.000    0.148    0.000 datastructures.py:305(__init__)
        1    0.056    0.056    0.204    0.204 <string>:1(<module>)
   100000    0.038    0.000    0.047    0.000 datastructures.py:307(<dictcomp>)
   100000    0.009    0.000    0.009    0.000 {method 'lower' of 'str' objects}
   100000    0.009    0.000    0.009    0.000 {method 'items' of 'dict' objects}
   100000    0.008    0.000    0.008    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.204    0.204 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

As you can see from the cProfile timing + number of function calls, we've saved a not insubstantial amount of work from being done, and that looks consistent over both classes and both likely paths through an HTTP Header.

Tests all pass for me locally...

Change History (3)

comment:1 by Carlton Gibson, 2 years ago

Triage Stage: UnreviewedAccepted

OK, I'll accept pending discussion on the PR. Thanks.

comment:2 by Mariusz Felisiak, 2 years ago

Has patch: set
Triage Stage: AcceptedReady for checkin

comment:3 by GitHub <noreply@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In 3de787a7:

Fixed #33532 -- Optimized CaseInsensitiveMapping instantiation for dicts.

Internal usages of this class (e.g. HttpHeaders) provide it with a dict,
so testing for that type first avoids the cost of going through the
potential instancecheck + _abc_instancecheck to establish it's
a Mapping.

Co-authored-by: Nick Pope <nick@…>

Note: See TracTickets for help on using tickets.
Back to Top