Opened 2 months ago

Closed 4 days ago

#36520 closed Bug (fixed)

Performance Regression in parse_header_params

Reported by: David Smith Owned by: Natalia Bidart
Component: HTTP handling Version: dev
Severity: Release blocker Keywords:
Cc: Carlton Gibson, Jake Howard, Mike Edmunds Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

​https://github.com/django/django/pull/18424 / #35440 simplified the parse_header_params() function.

This introduced a performance regression than can be seen on django-asv. ​link shows 5 performance regressions on or arround 28 March 2025, all of which relate to this commit.

A minimal reproduction shows a 5x performance regression.

Before:

In [2]: %timeit parse_header_parameters("text/plain")
500 ns Β± 1.42 ns per loop (mean Β± std. dev. of 7 runs, 1,000,000 loops each)

After

In [2]: %timeit parse_header_parameters("text/plain")
2.65 ΞΌs Β± 44.7 ns per loop (mean Β± std. dev. of 7 runs, 100,000 loops each)

Running a ​line-profile shows:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   320                                           @line_profiler.profile
   321                                           def parse_header_parameters(line, max_length=MAX_HEADER_LENGTH):
   322                                               """
   323                                               Parse a Content-type like header.
   324                                               Return the main content-type and a dictionary of options.
   325
   326                                               If `line` is longer than `max_length`, `ValueError` is raised.
   327                                               """
   328         1         11.0     11.0      2.4      if max_length is not None and line and len(line) > max_length:
   329                                                   raise ValueError("Unable to parse header parameters (value too long).")
   330
   331         1         47.0     47.0     10.3      m = Message()
   332         1         51.0     51.0     11.2      m["content-type"] = line
   333         1        332.0    332.0     73.0      params = m.get_params()
   334
   335         1          2.0      2.0      0.4      pdict = {}
   336         1          7.0      7.0      1.5      key = params.pop(0)[0].lower()
   337         1          2.0      2.0      0.4      for name, value in params:
   338                                                   if not name:
   339                                                       continue
   340                                                   if isinstance(value, tuple):
   341                                                       value = collapse_rfc2231_value(value)
   342                                                   pdict[name] = value
   343         1          3.0      3.0      0.7      return key, pdict

As I think this function is called with every request, I thought it worth raising.

Change History (31)

comment:1 by Natalia Bidart, 2 months ago

Severity: Normal β†’ Release blocker
Triage Stage: Unreviewed β†’ Accepted
Type: Uncategorized β†’ Bug
Version: 5.2 β†’ dev

Thank you David for raising this. IIUC, the slowest code path comes from the Python implementation when calling get_params()?

Also the 5x regression comes from your test, not asv, right? I see better values in asv if I'm reading that correctly.

comment:2 by David Smith, 2 months ago

Yes, get_params() seems to be where most of the time is being spent.

The 5x claim I made above is just looking at this function. Other things are happening in the ASV benchmarks, so the impact shown there is much smaller. Maybe that's a good reason to suggest that even though this function is much slower, it is small in the overall request/response cycle. However, I think that as this is called with every request even small gains are worth having.

I have done a bit more work on this. We could optimise for content types that do not include parameters and avoid a lot of the work in this case. Something like this:

I'm rather unsure how common parameters are. Some evidence would be useful to avoid optimising for the wrong thing.

diff --git a/django/utils/http.py b/django/utils/http.py
index 1c7aec7141..111740cd02 100644
--- a/django/utils/http.py
+++ b/django/utils/http.py
@@ -326,6 +326,10 @@ def parse_header_parameters(line, max_length=MAX_HEADER_LENGTH):
     if max_length is not None and line and len(line) > max_length:
         raise ValueError("Unable to parse header parameters (value too long).")

+    if line and ";" not in line:
+        # No parameters, just return the content type.
+        return line.strip().lower(), {}
+
     m = Message()
     m["content-type"] = line
     params = m.get_params()

comment:3 by Natalia Bidart, 2 months ago

Thank you David for following up on this ticket!

I was thinking that maybe it may be worth reporting this issue to Python upstream? I feel like the proposed shortcut could benefit the broader ecosystem and thus be part of the Message implementation. Would you be available to pursue that?

Thanks again!

comment:4 by Natalia Bidart, 2 months ago

Update: I built a script to benchmark this and I think I can report the issue myself to Python.

comment:5 by Natalia Bidart, 2 months ago

comment:6 by Carlton Gibson, 2 months ago

Since the regression is only on main so far, should we treat this as a release blocker on 6.0? Regardless of a resolution in Python (which will probably be slow if coming at all) David's patch looks preferable to the regression here. WDYT?

I'm rather unsure how common parameters are.

I don't have data but I'd punt that the vast majority of folks aren't using them.

Version 1, edited 2 months ago by Carlton Gibson (previous) (next) (diff)

comment:7 by Carlton Gibson, 2 months ago

Cc: Carlton Gibson added

comment:8 by Jake Howard, 2 months ago

I'm rather unsure how common parameters are.

Intentionally, probably not especially common. But browses seem eager to add charset=utf-8 when they default generate the Content-Type. Also, parameters are used to set the boundary for form submissions, so could be sent with the majority of POST requests.

Because any form submission will probably hit the slow path, as could many other clients, short-circuiting probably isn't the way to go. Given reverting #35440 ought not to be too complex, we can probably afford to hold off and see what the response from upstream is.

comment:9 by Natalia Bidart, 7 weeks ago

So! I followed the suggestion from the Python Discourse post, where Barry Scott suggested to benchmark *bytes* header parsing instead of unicode since:

When I worked on code to parse HTTP headers we worked in bytes to save the cost of decoding to unicode.

And I was surprised to see that the slowdown is actually a speed gain for complex headers. See these comparison tables:

Python 3.11 (real cgi, email.message gets bytes) cgi email.message ratio
text/plain 0.316 1.618 ~5.1x
text/html; charset=UTF-8; boundary=something 1.362 4.364 ~3.2x
application/x-stuff; title*=...; foo=bar; foo2*=... 1.822 2.405 ~1.3x
Python 3.13 (cgi shim, email.message gets bytes) cgi email.message ratio
text/plain 0.312 1.676 ~5.4x
text/html; charset=UTF-8; boundary=something 1.188 3.887 ~3.3x
application/x-stuff; title*=...; foo=bar; foo2*=... 4.358 2.175 ~0.5x

So maybe we could indeed shortcircuit the parsing if no ; is found in the header line (to avoid paying the 5x slowdown), but then is unclear to me how we could resort back to the header bytes when what we get in the WSGI environ is str.

Carlton, Jake, David, any ideas?

comment:10 by Natalia Bidart, 7 weeks ago

Cc: Jake Howard added

comment:11 by Jake Howard, 7 weeks ago

Encoding a string into bytes is incredibly fast. Perhaps doing the short-circuit, converting the data to bytes and then parsing it with email.message is fast enough? It's still going to be a performance regression, but hopefully not by quite as much.

in reply to:  11 comment:12 by Natalia Bidart, 7 weeks ago

Replying to Jake Howard:

Encoding a string into bytes is incredibly fast. Perhaps doing the short-circuit, converting the data to bytes and then parsing it with email.message is fast enough? It's still going to be a performance regression, but hopefully not by quite as much.

Below are benchmark results using my testing code, where line = line.encode("utf-8") is called inside the benchmarked function, just before invoking Message.get_params(). This approach could work: we see a potential 2x performance improvement for very large headers (which are often the source of security reports). However, the average case (having a charset and potentially a boundary) appears to suffer a 3x slowdown, and I don't think we could easily avoid that penalty.

Python 3.11 cgi get_params(str) get_params(bytes) ratio get_params(bytes) is
text/plain 0.337 1.635 1.672 0.20 5.0x slower
text/html; charset=UTF-8; boundary=something 1.362 4.348 4.514 0.30 3.3x slower
application/x-stuff; ... 1.955 8.675 2.494 1.27 1.3x slower
Python 3.12 cgi get_params(str) get_params(bytes) ratio get_params(bytes) is
text/plain 0.356 1.657 1.725 0.21 4.8x slower
text/html; charset=UTF-8; boundary=something 1.407 4.582 4.697 0.30 3.3x slower
application/x-stuff; ... 2.017 9.609 2.645 1.31 1.3x slower
Python 3.13 cgi get_params(str) get_params(bytes) ratio get_params(bytes) is
text/plain 0.325 1.613 1.717 0.19 5.3x slower
text/html; charset=UTF-8; boundary=something 1.167 3.862 3.943 0.30 3.4x slower
application/x-stuff; ... 4.263 9.445 2.252 1.89 1.9x faster
Python 3.14 cgi get_params(str) get_params(bytes) ratio get_params(bytes) is
text/plain 0.258 1.601 1.725 0.16 6.7x slower
text/html; charset=UTF-8; boundary=something 1.037 3.773 3.870 0.27 3.7x slower
application/x-stuff; ... 3.978 8.789 2.132 1.87 1.9x faster

comment:13 by Carlton Gibson, 7 weeks ago

I wonder if the best option isn't to revert 9aabe7eae3eeb3e64c5a0f3687118cd806158550, reopen #35440, and assess again in more leisurely time, keeping this performance issue in mind. πŸ€”

comment:14 by Jake Howard, 7 weeks ago

That sounds sensible to me.

As an aside, the parsing implementations from ​requests and ​werkzeug (ie flask) are both hand-rolled, and differ a fair bit from Django's. It's possibly using those implementations may be faster (or at least be used to accelerate what Django already has).

I'd be interested to see how those benchmark against cgi and email.message.

comment:15 by Natalia Bidart, 6 weeks ago

Following up on your interest, Jake, here are the comparisons using the linked implementations. All ratios are calculated against the base cgi-copied implementation currently in Django 5.2.x (and real cgi in 3.11 and 3.12). The next step, which I'd like to know if you're interested in, is to verify whether the comparisons with Requests and Werkzeug are fair and whether these implementations can be considered both robust and accurate.

Python 3.14 cgi requests werkzeug get_params(str) get_params(bytes)
text/plain 0.257 0.114 πŸš€ -2.3x 0.717 🐒 2.8x 1.664 🐒 6.5x 1.690 🐒 6.6x
text/html; charset=UTF-8; b... 1.033 0.494 πŸš€ -2.1x 1.538 🐒 1.5x 3.693 🐒 3.6x 3.907 🐒 3.8x
application/x-stuff; title*... 3.866 0.695 πŸš€ -5.6x 1.982 πŸš€ -2.0x 8.487 🐒 2.2x 2.152 πŸš€ -1.8x
Python 3.13 cgi requests werkzeug get_params(str) get_params(bytes)
text/plain 0.308 0.134 πŸš€ -2.3x 0.764 🐒 2.5x 1.661 🐒 5.4x 1.689 🐒 5.5x
text/html; charset=UTF-8; b... 1.135 0.527 πŸš€ -2.2x 1.617 🐒 1.4x 3.769 🐒 3.3x 3.980 🐒 3.5x
application/x-stuff; title*... 4.224 0.721 πŸš€ -5.9x 2.069 πŸš€ -2.0x 9.363 🐒 2.2x 2.164 πŸš€ -2.0x
Python 3.12 cgi requests werkzeug get_params(str) get_params(bytes)
text/plain 0.329 0.136 πŸš€ -2.4x 0.756 🐒 2.3x 1.683 🐒 5.1x 1.816 🐒 5.5x
text/html; charset=UTF-8; b... 1.441 0.623 πŸš€ -2.3x 1.718 🐒 1.2x 4.599 🐒 3.2x 4.709 🐒 3.3x
application/x-stuff; title*... 1.997 0.827 πŸš€ -2.4x 2.199 🐒 1.1x 9.565 🐒 4.8x 2.521 🐒 1.3x
Python 3.11 cgi requests werkzeug get_params(str) get_params(bytes)
text/plain 0.322 0.123 πŸš€ -2.6x 0.696 🐒 2.2x 1.671 🐒 5.2x 1.667 🐒 5.2x
text/html; charset=UTF-8; b... 1.360 0.573 πŸš€ -2.4x 1.604 🐒 1.2x 4.337 🐒 3.2x 4.498 🐒 3.3x
application/x-stuff; title*... 1.868 0.757 πŸš€ -2.5x 2.106 🐒 1.1x 8.624 🐒 4.6x 2.472 🐒 1.3x

comment:16 by Natalia Bidart, 6 weeks ago

The benchmark script can be found ​here.

Since werkzeug performs nearly as slowly as get_params(bytes), and although I’m hesitant (but not opposed) to reverting the fix, would it be acceptable to address this by encoding to bytes the header line while continuing to use Python's stdlib Message().get_params()?

comment:17 by Jake Howard, 4 weeks ago

The requests implementation was the most interesting to me, being so much faster. After plugging it in to Django however, it seems it's not quite as robust - many of the test cases fail and produce very unexpected results. werkzeug's implementation appears to be more robust to the weird edge cases we need to handle, but still isn't perfect.

To revert or not to revert feels difficult. On the one hand, the cgi implementation is faster almost across the board. However, it's a fair bit of work to maintain, and parser code like that tends to be where security issues get introduced. Message().get_params is slower (at least on the easy cases), but using bytes makes it slightly less slower than it currently is, is the officially recommended path forward, and could get improvements as time goes on (removing the need to create a new Message instance each time improves performance quite a bit). Personally, I'd vote for adding the short-circuit and bytes changes over reverting.

As an aside, to integrate requests and werkzeug's implementations into Django, I had to modify how multi-part form parsing worked, since it passed the entire header to parse_header_parameters, rather than just the value. I doubt it'll be a performance improvement, but probably something worth cleaning up regardless, especially if we do swap out the implementation. I can push up a PR if it's useful?

in reply to:  17 comment:18 by Natalia Bidart, 4 weeks ago

Replying to Jake Howard:

The requests implementation was the most interesting to me, being so much faster. After plugging it in to Django however, it seems it's not quite as robust - many of the test cases fail and produce very unexpected results. werkzeug's implementation appears to be more robust to the weird edge cases we need to handle, but still isn't perfect.

Thanks for checking!

To revert or not to revert feels difficult. On the one hand, the cgi implementation is faster almost across the board. However, it's a fair bit of work to maintain, and parser code like that tends to be where security issues get introduced. Message().get_params is slower (at least on the easy cases), but using bytes makes it slightly less slower than it currently is, is the officially recommended path forward, and could get improvements as time goes on (removing the need to create a new Message instance each time improves performance quite a bit). Personally, I'd vote for adding the short-circuit and bytes changes over reverting.

I fully agree with this rationale.

As an aside, to integrate requests and werkzeug's implementations into Django, I had to modify how multi-part form parsing worked, since it passed the entire header to parse_header_parameters, rather than just the value. I doubt it'll be a performance improvement, but probably something worth cleaning up regardless, especially if we do swap out the implementation. I can push up a PR if it's useful?

Yes please, a PR with these fixes would be much appreciated. Let me know when it's up so I can prioritize it for reviews. Thank you for investigating further!

comment:19 by nessita <124304+nessita@…>, 3 weeks ago

In 41ff30f:

Refs #36520 -- Ensured only the header value is passed to parse_header_parameters for multipart requests.

Header parsing should apply only to the header value. The previous
implementation happened to work but relied on unintended behavior.

comment:20 by Natalia Bidart, 2 weeks ago

Owner: set to Natalia Bidart
Status: new β†’ assigned

comment:21 by Natalia Bidart, 2 weeks ago

I started looking into this AGAIN to see how the branch using bytes would look like. In my head, "passing bytes" to Message.get_params would just be:

  • django/utils/http.py

    diff --git a/django/utils/http.py b/django/utils/http.py
    index 504f28c678..54dc85aead 100644
    a b def parse_header_parameters(line, max_length=MAX_HEADER_LENGTH):  
    327327        raise ValueError("Unable to parse header parameters (value too long).")
    328328
    329329    m = Message()
    330     m["content-type"] = line
     330    m["content-type"] = line.encode("utf-8")
    331331    params = m.get_params()
    332332
    333333    pdict = {}

But this change generates all sort of test failures: some are "expected" because we'd need to decode the results into what's actually expected as a str result, but other show jusy plain wrong results. For example, for a rfc2231 like this:

Content-Type: application/x-stuff; title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A

Passing str to Message.get_params results in the correct:

[('Content-Type: application/x-stuff', ''),  ('title', ('us-ascii', 'en-us', 'This is ***fun***'))]

But passing bytes yields:

[('b"content-type: application/x-stuff; title*',  'us-ascii\'en-us\'This%20is%20%2A%2A%2Afun%2A%2A%2A"')]

I further debugged and ended up in a rabbit hole in the Python code, to realize that get_params is not, at least to my exhausted eyes, expecting bytes in any way! I posted an update in the Python Discourse, but I would truly appreciate your opinion on this since at this point my head is very confused.

I dug into Message.get_params(), and from the call chain it seems clear that everything is built around str headers. For example, _parseparam() (the call chain is get_params -> _get_params_preserve -> _parseparam) starts with these two lines (​https://github.com/python/cpython/blob/main/Lib/email/message.py#L73):

def _parseparam(s):
    # RDM This might be a Header, so for now stringify it.
    s = ';' + str(s)

It then string-splits, string-strips, and string-compares all the way down. If I assign bytes to the header, the parsing doesn’t work as expected: you basically just get the repr() of the bytes object. So while I understand the idea of avoiding decoding overhead, I don't see a way to feed bytes into Message and get equivalent results. The API really assumes str throughout.

Side note

There is ALSO the question of whether the header name should or should not be passed to parse_header_parameters. On a second look in our code, I think that line is expected to have the header name in it. Tests are super confusing in this regard, see how test_basic does not pass it but the rest of test cases do! ​https://github.com/django/django/blob/main/tests/utils_tests/test_http.py

comment:22 by Jacob Walls, 2 weeks ago

Going with David's short-circuit suggestion in comment:2 seems wise to me. I don't think it's worth reverting and waiting to hear from upstream, since if I were CPython I don't think I'd take any action:

  • the performance impact although measurable is extremely minor
  • it's inside a ​legacy method for the "legacy" Message class
  • the burden of optimizing for the "right thing" in David's words is probably on us (Django)

Re: "legacy", porting to EmailMessage performs 10x worse. For posterity I had something like this, but with test failures:

  • django/utils/http.py

    diff --git a/django/utils/http.py b/django/utils/http.py
    index 504f28c678..8a05780160 100644
    a b import re  
    33import unicodedata
    44from binascii import Error as BinasciiError
    55from datetime import UTC, datetime
    6 from email.message import Message
     6from email.message import EmailMessage
    77from email.utils import collapse_rfc2231_value, formatdate
    88from urllib.parse import quote
    99from urllib.parse import urlencode as original_urlencode
    def parse_header_parameters(line, max_length=MAX_HEADER_LENGTH):  
    326326    if max_length is not None and line and len(line) > max_length:
    327327        raise ValueError("Unable to parse header parameters (value too long).")
    328328
    329     m = Message()
     329    m = EmailMessage()
    330330    m["content-type"] = line
    331     params = m.get_params()
     331    params = m["content-type"].params
    332332
    333333    pdict = {}
    334     key = params.pop(0)[0].lower()
    335     for name, value in params:
     334    key = m["content-type"].split(";")[0]
     335    for name, value in params.items():
    336336        if not name:
    337337            continue
    338338        if isinstance(value, tuple):

I checked DRF, and DRF calls Django's parse_header_parameters in ​a nested loop, about 17 times for the view I was testing. The values it tested were:

application/json
text/html; q=1.0
*/*
application/json
text/html; q=1.0
text/html
text/html; q=1.0
text/html; q=1.0
text/html
application/json
application/json
application/json
application/json
text/html; q=1.0
text/html; q=1.0
application/json
multipart/form-data

Simplifying the bench (to remove the red herring bytes stuff and add in David's suggestion from comment:2):

Python 3.13 cgi requests werkzeug get_params(str) get_params#comment:2
application/json 0.358 0.145 πŸš€ -2.5x 0.806 🐒 2.3x 1.758 🐒 4.9x 0.089 πŸš€ -4.0x
text/html; q=1.0 0.887 0.375 πŸš€ -2.4x 1.210 🐒 1.4x 2.954 🐒 3.3x 3.497 🐒 3.9x

So, with comment:2, all but a handful of these calls will be faster. I think we could call this a wash at that point.

comment:23 by David Smith, 2 weeks ago

Looking at the original ​PEP 333 under envion variables it says...

Also note that CGI-defined variables must be strings, if they are present at all. It is a violation of this specification for a CGI variable’s value to be of any type other than str.

And the modern ​PEP3333 it says:

β€œNative” strings (which are always implemented using the type named str) that are used for request/response headers and metadata.

So the comments above about get_params working with strings and WSGI environ being str seems to be in line with the spec.

Given how close to the feature freeze we are, I support Carlton's comment about reverting the change and coming back to it at a more leisurely pace.

comment:24 by Natalia Bidart, 2 weeks ago

Cc: Mike Edmunds added

Mike (added as cc), would you have an opinion? I consider your an expert in everything relating to Python email's API (legacy and modern message classes).

comment:25 by Mike Edmunds, 2 weeks ago

I don't have any deep insights here (and my expertise in Python's email APIs is, at best, at the "enough knowledge to be dangerous" level). But from what I've seen:

  • I was… surprised to see Python recommending the legacy email APIs as a replacement for the cgi package. I've been under the impression the legacy APIs are being gently edged toward deprecation.
  • Python's email package (both APIs) suffers from a severe shortage of maintainer bandwidth. Even security fixes can take months or years to land. Having two differentβ€”but intertwinedβ€”email APIs doesn't help. (To be fair, the cgi package had exactly zero maintainers, which is strictly worse.)
  • I don't think performance has been a primary consideration in either email API's design. (The modern API prioritizes simplified DX around complying with newer RFCs. It has several extra layers compared to the legacy API, so it's probably going to be slower. But maybe more accurate in some corner cases.)
  • Although both email APIs are specced to parse HTTP headers (MIME is MIME), I'm guessing neither has had as much real-world use for that purpose as the cgi package did. It wouldn't be surprising to see a bunch of HTTP-specific email package issues emerge over the next few years. (Call this the first one.)

Given all that, I suspect Django would be better off not using Python's email package to parse HTTP headers. The technical and maintenance requirements aren't well aligned. Jacob's analysis in comment:22 seems spot on. James Webber's ​reply to Natalia's Python forum post also puts it pretty well.

Post-6.0, perhaps there could be a collaboration with Flask's maintainers on a high-performance, secure, maintained HTTP header parser purpose-built for web framework needs? (Or maybe there's already one in rust just waiting for some Python bindings?)

in reply to:  25 ; comment:26 by Natalia Bidart, 2 weeks ago

Replying to Mike Edmunds:

Given all that, I suspect Django would be better off not using Python's email package to parse HTTP headers. The technical and maintenance requirements aren't well aligned. Jacob's analysis in comment:22 seems spot on. James Webber's ​reply to Natalia's Python forum post also puts it pretty well.

Thanks so much, Mike! This is incredibly useful. Jacob and I will chat more in person next week at DjangoCon US, but it looks like we may need to go with the revert for now. I really appreciate your perspective, it helps a lot in framing the next steps.

Post-6.0, perhaps there could be a collaboration with Flask's maintainers on a high-performance, secure, maintained HTTP header parser purpose-built for web framework needs? (Or maybe there's already one in rust just waiting for some Python bindings?)

Current Flask parsing is slower than our backported cgi version. Were you envisioning a collaboration to optimize their implementation for performance (perhaps reusing some of cgi bits) or to find a way to use ours across projects under a shared agreement?

in reply to:  26 comment:27 by Mike Edmunds, 2 weeks ago

Replying to Natalia Bidart:

Replying to Mike Edmunds:

Post-6.0, perhaps there could be a collaboration with Flask's maintainers on a high-performance, secure, maintained HTTP header parser purpose-built for web framework needs? (Or maybe there's already one in rust just waiting for some Python bindings?)

Current Flask parsing is slower than our backported cgi version. Were you envisioning a collaboration to optimize their implementation for performance (perhaps reusing some of cgi bits) or to find a way to use ours across projects under a shared agreement?

I was thinking about your comment in the Python forum that "relying on the Python stdlib whenever possible improves maintainability, reduces potential security risks, and benefits from broader community scrutiny." Although generally true, it seems like that isn't the case for this particular use. But that maybe there's an opportunity for web frameworks with that shared need (should've included Tornado, probably others) to create a shared, de facto standard package outside of Python stdlib.

(That said, I don't see anything wrong with Django maintaining its own backported bits of cgi.)

comment:28 by Natalia Bidart, 4 days ago

Has patch: set

comment:29 by Jacob Walls, 4 days ago

Quoting ​Brett Cannon on the python forum:

You can open an issue, but I would advise you keep your backport of cgi.parse_header() if it gives you the performance you’re after.

Agree with comment:27 that there's nothing wrong with keeping a backport, but I do start to doubt whether we should "be after" anything here at all. 2100 nanoseconds for clearer & stdlib Python seems worth paying for, even in every request. DRF is calling this function oodles of times without a care. We could probably shave the 2100 nanoseconds down by using a module-level Message constant, given that we only get and set one header type.

But I agree with Natalia and comment:13 that a revert makes sense at present given the time pressure.


Because any form submission will probably hit the slow path, as could many other clients, short-circuiting probably isn't the way to go.

Fair, but if it helps DRF at least perform no worse (given some calls improve and others degrade), I think it's still viable.

comment:30 by Jacob Walls, 4 days ago

Triage Stage: Accepted β†’ Ready for checkin

comment:31 by nessita <124304+nessita@…>, 4 days ago

Resolution: β†’ fixed
Status: assigned β†’ closed

In 424e0d86:

Fixed #36520 -- Reverted "Fixed #35440 -- Simplified parse_header_parameters by leveraging stdlid's Message."

This partially reverts commit 9aabe7eae3eeb3e64c5a0f3687118cd806158550.

The simplification of parse_header_parameters using stdlib's Message
is reverted due to a performance regression. The check for the header
maximum length remains in place, per Security Team guidance.

Thanks to David Smith for reporting the regression, and Jacob Walls for
the review.

Note: See TracTickets for help on using tickets.
Back to Top