Opened 3 hours ago
#37198 new Bug
content_disposition_header emits invalid header for a filename with a trailing newline ("$" should be "\Z")
| Reported by: | Aman Agrawal | Owned by: | |
|---|---|---|---|
| Component: | HTTP handling | Version: | 5.2 |
| Severity: | Normal | Keywords: | |
| Cc: | Aman Agrawal | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | yes | UI/UX: | no |
Description
content_disposition_header() is meant to emit the percent-encoded
filename*=utf-8 form for any filename that is not a valid RFC 9110
quoted-string, and to only use the bare quoted form for filenames that
are. Its check has a blind spot for a trailing newline:
>>> from django.utils.http import content_disposition_header
>>> content_disposition_header(True, "report.pdf\n")
'attachment; filename="report.pdf\n"'
>>> content_disposition_header(True, "\n")
'attachment; filename="\n"'
The returned value contains a raw newline, so setting it as a header
raises BadHeaderError (Django responses), and boto3/http.client raises
ValueError("Invalid header value ...") — an uncaught 500 for anyone
serving a user-supplied filename that ends in a newline. (A newline
*elsewhere* in the filename is handled correctly.)
Root cause
The quotable-character test in django/utils/http.py is:
quotable_characters = r"^[\t \x21-\x7e]*$"
if is_ascii and re.match(quotable_characters, filename):
In Python, "$" matches at the end of the string *or immediately before
a trailing "\n"*. So a filename of quotable characters plus one
trailing newline matches, takes the quoted-string branch, and is
emitted verbatim. This is the same class of bug as CVE-2021-32052
(URLValidator "$" accepting a trailing newline), after which validators
were switched to "\Z".
This was introduced with the control-character handling in #36023.
Proposed fix
Anchor with "\Z" (matches only the true end of string):
- quotable_characters = r"^[\t \x21-\x7e]*$"
+ quotable_characters = r"^[\t \x21-\x7e]*\Z"
Verified this sends "report.pdf\n" and "\n" to the filename*=utf-8
branch while leaving all other filenames (e.g. "report.pdf",
"my report.png") on the existing quoted-string branch. A regression
case should be added to ContentDispositionHeaderTests in
tests/utils_tests/test_http.py, e.g.:
("attachment; filename*=utf-8''report.pdf%0A", True, "report.pdf\n"),