Opened 14 months ago

Closed 3 months ago

#35440 closed Cleanup/optimization (fixed)

Update parse_header_parameters to leverage the parsing logic from (stdlib) email Message.

Reported by: Natalia Bidart Owned by: Khudyakov Artem
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Carlton Gibson, Adam Johnson, Shai Berger Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Natalia Bidart)

Following a security report, which was concluded not to be such, the Security Team agreed to apply a few improvements to the parse_header_parameters function from django/utils/http.py.

This function was historically using Python's (now deprecated) cgi module, but in 34e2148fc725e7200050f74130d7523e3cd8507a the code was changed by porting the cgi implementations into the file to avoid the deprecation (this was in the context of #33173). Later on the header parsing logic was cleaned up in d4d5427571b4bf3a21c902276c2a00215c2a37cc to unify the logic with the one used in the multipartparser.py module (see #33697).

At the time of the cgi deprecation, the PEP including the deprecation mentions:

Replacements for the various parts of cgi which are not directly related to executing code are: [ ... ] parse_header with email.message.Message (see example below)

Providing an explicit example of how close parse_header and email.message.Message are:

>>> from cgi import parse_header
>>> from email.message import Message
>>> parse_header(h)
('application/json', {'charset': 'utf8'})
>>> m = Message()
>>> m['content-type'] = h
>>> m.get_params()
[('application/json', ''), ('charset', 'utf8')]
>>> m.get_param('charset')

The goal of this ticket is to track the improvement of the current parse_header_parameters implementation by leveraging the logic from email.message.Message.

The Security Team also agreed that it's worth adding some early checks in the parse_header_parameters function to limit the amount of provided semicolons. This would require some investigation as to what would be a good threshold, considering that it's likely that more than one semicolon may not be necessary in valid HTTP headers.

Change History (11)

comment:1 by Natalia Bidart, 14 months ago

Description: modified (diff)

comment:2 by David Smith, 14 months ago

Triage Stage: UnreviewedAccepted

comment:3 by Khudyakov Artem, 11 months ago

Owner: changed from nobody to Khudyakov Artem
Status: newassigned

comment:6 by Khudyakov Artem, 11 months ago

Has patch: set

comment:7 by Natalia Bidart, 11 months ago

Patch needs improvement: set

Setting as patch needs improvements following my comments in the PR (the goal of this ticket is, ideally, to remove/replace our custom parsing logic with Python stdlib's Message methods).

comment:8 by Khudyakov Artem, 11 months ago

Patch needs improvement: unset

I've proposed a fix in the PR, but have clarifying questions. In case this task is not on the review radar, I change the status, but am ready to make further modifications.

comment:9 by Natalia Bidart, 7 months ago

Patch needs improvement: set

comment:10 by Khudyakov Artem, 5 months ago

Patch needs improvement: unset

comment:11 by Shai Berger, 4 months ago

Cc: Shai Berger added
Patch needs improvement: set

comment:12 by Natalia Bidart, 3 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:13 by nessita <124304+nessita@…>, 3 months ago

Resolution: fixed
Status: assignedclosed

In 9aabe7e:

Fixed #35440 -- Simplified parse_header_parameters by leveraging stdlid's Message.

The parse_header_parameters function historically used Python's cgi
module (now deprecated). In 34e2148fc725e7200050f74130d7523e3cd8507a,
the logic was inlined to work around this deprecation ( #33173). Later,
in d4d5427571b4bf3a21c902276c2a00215c2a37cc, the header parsing logic
was further cleaned up to align with multipartparser.py (#33697).

This change takes it a step further by replacing the copied cgi logic with
Python's email.message.Message API for a more robust and maintainable header
parsing implementation.

Thanks to Raphael Gaschignard for testing, and to Adam Johnson and Shai
Berger for reviews.

Co-authored-by: Ben Cail <bcail@…>
Co-authored-by: Natalia <124304+nessita@…>

Note: See TracTickets for help on using tickets.
Back to Top