Opened 9 months ago
Last modified 2 months ago
#35440 assigned Cleanup/optimization
Update parse_header_parameters to leverage the parsing logic from (stdlib) email Message.
Reported by: | Natalia Bidart | Owned by: | Khudyakov Artem |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Carlton Gibson, Adam Johnson | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Following a security report, which was concluded not to be such, the Security Team agreed to apply a few improvements to the parse_header_parameters
function from django/utils/http.py
.
This function was historically using Python's (now deprecated) cgi
module, but in 34e2148fc725e7200050f74130d7523e3cd8507a the code was changed by porting the cgi
implementations into the file to avoid the deprecation (this was in the context of #33173). Later on the header parsing logic was cleaned up in d4d5427571b4bf3a21c902276c2a00215c2a37cc to unify the logic with the one used in the multipartparser.py
module (see #33697).
At the time of the cgi
deprecation, the PEP including the deprecation mentions:
Replacements for the various parts of cgi which are not directly related to executing code are: [ ... ] parse_header with email.message.Message (see example below)
Providing an explicit example of how close parse_header
and email.message.Message
are:
>>> from cgi import parse_header >>> from email.message import Message >>> parse_header(h) ('application/json', {'charset': 'utf8'}) >>> m = Message() >>> m['content-type'] = h >>> m.get_params() [('application/json', ''), ('charset', 'utf8')] >>> m.get_param('charset')
The goal of this ticket is to track the improvement of the current parse_header_parameters
implementation by leveraging the logic from email.message.Message
.
The Security Team also agreed that it's worth adding some early checks in the parse_header_parameters
function to limit the amount of provided semicolons. This would require some investigation as to what would be a good threshold, considering that it's likely that more than one semicolon may not be necessary in valid HTTP headers.
Change History (7)
comment:1 by , 9 months ago
Description: | modified (diff) |
---|
comment:2 by , 9 months ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 6 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:6 by , 6 months ago
Has patch: | set |
---|
comment:7 by , 6 months ago
Patch needs improvement: | set |
---|
Setting as patch needs improvements following my comments in the PR (the goal of this ticket is, ideally, to remove/replace our custom parsing logic with Python stdlib's Message
methods).
comment:8 by , 6 months ago
Patch needs improvement: | unset |
---|
I've proposed a fix in the PR, but have clarifying questions. In case this task is not on the review radar, I change the status, but am ready to make further modifications.
comment:9 by , 2 months ago
Patch needs improvement: | set |
---|
PR