Opened 11 days ago

Last modified 11 days ago

#35440 new Cleanup/optimization

Update parse_header_parameters to leverage the parsing logic from (stdlib) email Message.

Reported by: Natalia Bidart Owned by: nobody
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Carlton Gibson, Adam Johnson Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Natalia Bidart)

Following a security report, which was concluded not to be such, the Security Team agreed to apply a few improvements to the parse_header_parameters function from django/utils/http.py.

This function was historically using Python's (now deprecated) cgi module, but in 34e2148fc725e7200050f74130d7523e3cd8507a the code was changed by porting the cgi implementations into the file to avoid the deprecation (this was in the context of #33173). Later on the header parsing logic was cleaned up in d4d5427571b4bf3a21c902276c2a00215c2a37cc to unify the logic with the one used in the multipartparser.py module (see #33697).

At the time of the cgi deprecation, the PEP including the deprecation mentions:

Replacements for the various parts of cgi which are not directly related to executing code are: [ ... ] parse_header with email.message.Message (see example below)

Providing an explicit example of how close parse_header and email.message.Message are:

>>> from cgi import parse_header
>>> from email.message import Message
>>> parse_header(h)
('application/json', {'charset': 'utf8'})
>>> m = Message()
>>> m['content-type'] = h
>>> m.get_params()
[('application/json', ''), ('charset', 'utf8')]
>>> m.get_param('charset')

The goal of this ticket is to track the improvement of the current parse_header_parameters implementation by leveraging the logic from email.message.Message.

The Security Team also agreed that it's worth adding some early checks in the parse_header_parameters function to limit the amount of provided semicolons. This would require some investigation as to what would be a good threshold, considering that it's likely that more than one semicolon may not be necessary in valid HTTP headers.

Change History (2)

comment:1 by Natalia Bidart, 11 days ago

Description: modified (diff)

comment:2 by David Smith, 11 days ago

Triage Stage: UnreviewedAccepted
Note: See TracTickets for help on using tickets.
Back to Top