Opened 8 months ago

Last modified 4 weeks ago

#35440 assigned Cleanup/optimization

Update parse_header_parameters to leverage the parsing logic from (stdlib) email Message.

Reported by: Natalia Bidart Owned by: Khudyakov Artem
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Carlton Gibson, Adam Johnson Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Natalia Bidart)

Following a security report, which was concluded not to be such, the Security Team agreed to apply a few improvements to the parse_header_parameters function from django/utils/http.py.

This function was historically using Python's (now deprecated) cgi module, but in 34e2148fc725e7200050f74130d7523e3cd8507a the code was changed by porting the cgi implementations into the file to avoid the deprecation (this was in the context of #33173). Later on the header parsing logic was cleaned up in d4d5427571b4bf3a21c902276c2a00215c2a37cc to unify the logic with the one used in the multipartparser.py module (see #33697).

At the time of the cgi deprecation, the PEP including the deprecation mentions:

Replacements for the various parts of cgi which are not directly related to executing code are: [ ... ] parse_header with email.message.Message (see example below)

Providing an explicit example of how close parse_header and email.message.Message are:

>>> from cgi import parse_header
>>> from email.message import Message
>>> parse_header(h)
('application/json', {'charset': 'utf8'})
>>> m = Message()
>>> m['content-type'] = h
>>> m.get_params()
[('application/json', ''), ('charset', 'utf8')]
>>> m.get_param('charset')

The goal of this ticket is to track the improvement of the current parse_header_parameters implementation by leveraging the logic from email.message.Message.

The Security Team also agreed that it's worth adding some early checks in the parse_header_parameters function to limit the amount of provided semicolons. This would require some investigation as to what would be a good threshold, considering that it's likely that more than one semicolon may not be necessary in valid HTTP headers.

Change History (7)

comment:1 by Natalia Bidart, 8 months ago

Description: modified (diff)

comment:2 by David Smith, 8 months ago

Triage Stage: UnreviewedAccepted

comment:3 by Khudyakov Artem, 5 months ago

Owner: changed from nobody to Khudyakov Artem
Status: newassigned

comment:6 by Khudyakov Artem, 5 months ago

Has patch: set

comment:7 by Natalia Bidart, 5 months ago

Patch needs improvement: set

Setting as patch needs improvements following my comments in the PR (the goal of this ticket is, ideally, to remove/replace our custom parsing logic with Python stdlib's Message methods).

comment:8 by Khudyakov Artem, 5 months ago

Patch needs improvement: unset

I've proposed a fix in the PR, but have clarifying questions. In case this task is not on the review radar, I change the status, but am ready to make further modifications.

comment:9 by Natalia Bidart, 4 weeks ago

Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top