Opened 10 years ago

Closed 10 years ago

#22971 closed Bug (fixed)

Can't receive file with non-ascii filename according to rfc2388

Reported by: homm Owned by: nobody
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Requests, popular Python library, starting from version 2.0 sends files with non-ascii characters in filename in full compliance with rfc2388:

The original local file name may be supplied as well, either as a
"filename" parameter either of the "content-disposition: form-data"
header or, in the case of multiple files, in a "content-disposition:
file" header of the subpart. The sending application MAY supply a
file name; if the file name of the sender's operating system is not
in US-ASCII, the file name might be approximated, or encoded using
the method of RFC 2231.

Where RFC 2231 defines attributes with * char. And requests uses such attribute name to send non-ascii file names.

# requests 1.2.3
>>> requests.post('http://ya.ru', files={'file': (u'файл', '123')}).request.body
--cb90e5c32429403b99966534716cda56
Content-Disposition: form-data; name="file"; filename="файл"
Content-Type: application/octet-stream

123
--cb90e5c32429403b99966534716cda56--


# requests 2.0
>>> requests.post('http://ya.ru', files={'file': (u'файл', '123')}).request.body
--40f2f1873ec843598773fe150b4f783a
Content-Disposition: form-data; name="file"; filename*=utf-8''%D1%84%D0%B0%D0%B9%D0%BB

123
--40f2f1873ec843598773fe150b4f783a--

But Django doesn't recognize such files and puts raw files content in request.POST instead of population request.FILES.

Attachments (1)

22971.test.diff (1.7 KB ) - added by Claude Paroz 10 years ago.
Failing test case

Download all attachments as: .zip

Change History (10)

by Claude Paroz, 10 years ago

Attachment: 22971.test.diff added

Failing test case

comment:2 by Claude Paroz, 10 years ago

Triage Stage: UnreviewedAccepted

comment:4 by homm, 10 years ago

Thanks, claudep, it's great!

A want to note, according rfc5987(http://tools.ietf.org/html/rfc5987#section-4.2) unicode value should be preferred over ascii.

In this case, the sender provides an ASCII version of the title for
legacy recipients, but also includes an internationalized version for
recipients understanding this specification -- the latter obviously
ought to prefer the new syntax over the old one.

I still don't know is rfc5987 applicable to multipart headers, though.

comment:5 by Claude Paroz, 10 years ago

In the current implementation, the last one always wins. I'm not sure it's worth complicating the implementation if we even don't know if some user agents are indeed using this feature (and with the ascii version appearing after the encoded one).

comment:6 by Cea Stapleton, 10 years ago

RFC # 5987 isn't relevant here as it's about the server sending files to the client but not the inverse.

comment:7 by Cea Stapleton, 10 years ago

Sent a pull request to add another test and update the RFC#

comment:8 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

comment:9 by Claude Paroz <claude@…>, 10 years ago

Resolution: fixed
Status: newclosed

In b42e5ca058178d67027bf66d37d00ade635b4c26:

Fixed #22971 -- Properly parsed RFC 2388 encoded headers

Thanks homm for the report, Cea Stapleton for patch improvements
and Ian Cordasco, Christian Schmitt and Tim Graham for the review.

Note: See TracTickets for help on using tickets.
Back to Top