Opened 8 years ago

Closed 8 years ago

#22971 closed Bug (fixed)

Can't receive file with non-ascii filename according to rfc2388

Reported by: homm Owned by: nobody
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Requests, popular Python library, starting from version 2.0 sends files with non-ascii characters in filename in full compliance with rfc2388:

The original local file name may be supplied as well, either as a
"filename" parameter either of the "content-disposition: form-data"
header or, in the case of multiple files, in a "content-disposition:
file" header of the subpart. The sending application MAY supply a
file name; if the file name of the sender's operating system is not
in US-ASCII, the file name might be approximated, or encoded using
the method of RFC 2231.

Where RFC 2231 defines attributes with * char. And requests uses such attribute name to send non-ascii file names.

# requests 1.2.3
>>>'', files={'file': (u'файл', '123')}).request.body
Content-Disposition: form-data; name="file"; filename="файл"
Content-Type: application/octet-stream


# requests 2.0
>>>'', files={'file': (u'файл', '123')}).request.body
Content-Disposition: form-data; name="file"; filename*=utf-8''%D1%84%D0%B0%D0%B9%D0%BB


But Django doesn't recognize such files and puts raw files content in request.POST instead of population request.FILES.

Attachments (1)

22971.test.diff (1.7 KB) - added by Claude Paroz 8 years ago.
Failing test case

Download all attachments as: .zip

Change History (10)

Changed 8 years ago by Claude Paroz

Attachment: 22971.test.diff added

Failing test case

comment:2 Changed 8 years ago by Claude Paroz

Triage Stage: UnreviewedAccepted

comment:4 Changed 8 years ago by homm

Thanks, claudep, it's great!

A want to note, according rfc5987( unicode value should be preferred over ascii.

In this case, the sender provides an ASCII version of the title for
legacy recipients, but also includes an internationalized version for
recipients understanding this specification -- the latter obviously
ought to prefer the new syntax over the old one.

I still don't know is rfc5987 applicable to multipart headers, though.

comment:5 Changed 8 years ago by Claude Paroz

In the current implementation, the last one always wins. I'm not sure it's worth complicating the implementation if we even don't know if some user agents are indeed using this feature (and with the ascii version appearing after the encoded one).

comment:6 Changed 8 years ago by Cea Stapleton

RFC # 5987 isn't relevant here as it's about the server sending files to the client but not the inverse.

comment:7 Changed 8 years ago by Cea Stapleton

Sent a pull request to add another test and update the RFC#

comment:8 Changed 8 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:9 Changed 8 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: newclosed

In b42e5ca058178d67027bf66d37d00ade635b4c26:

Fixed #22971 -- Properly parsed RFC 2388 encoded headers

Thanks homm for the report, Cea Stapleton for patch improvements
and Ian Cordasco, Christian Schmitt and Tim Graham for the review.

Note: See TracTickets for help on using tickets.
Back to Top