#22971 closed Bug (fixed)

Can't receive file with non-ascii filename according to rfc2388

Reported by: homm Owned by: nobody
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Requests, popular Python library, starting from version 2.0 sends files with non-ascii characters in filename in full compliance with rfc2388:

The original local file name may be supplied as well, either as a
"filename" parameter either of the "content-disposition: form-data"
header or, in the case of multiple files, in a "content-disposition:
file" header of the subpart. The sending application MAY supply a
file name; if the file name of the sender's operating system is not
in US-ASCII, the file name might be approximated, or encoded using
the method of RFC 2231.

Where RFC 2231 defines attributes with * char. And requests uses such attribute name to send non-ascii file names.

# requests 1.2.3
>>> requests.post('http://ya.ru', files={'file': (u'файл', '123')}).request.body
--cb90e5c32429403b99966534716cda56
Content-Disposition: form-data; name="file"; filename="файл"
Content-Type: application/octet-stream

123
--cb90e5c32429403b99966534716cda56--


# requests 2.0
>>> requests.post('http://ya.ru', files={'file': (u'файл', '123')}).request.body
--40f2f1873ec843598773fe150b4f783a
Content-Disposition: form-data; name="file"; filename*=utf-8''%D1%84%D0%B0%D0%B9%D0%BB

123
--40f2f1873ec843598773fe150b4f783a--

But Django doesn't recognize such files and puts raw files content in request.POST instead of population request.FILES.

Attachments (1)

22971.test.diff (1.7 KB) - added by claudep 12 months ago.
Failing test case

Download all attachments as: .zip

Change History (10)

comment:1 Changed 12 months ago by homm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Changed 12 months ago by claudep

Failing test case

comment:2 Changed 12 months ago by claudep

  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 12 months ago by homm

Thanks, claudep, it's great!

A want to note, according rfc5987(http://tools.ietf.org/html/rfc5987#section-4.2) unicode value should be preferred over ascii.

In this case, the sender provides an ASCII version of the title for
legacy recipients, but also includes an internationalized version for
recipients understanding this specification -- the latter obviously
ought to prefer the new syntax over the old one.

I still don't know is rfc5987 applicable to multipart headers, though.

comment:5 Changed 12 months ago by claudep

In the current implementation, the last one always wins. I'm not sure it's worth complicating the implementation if we even don't know if some user agents are indeed using this feature (and with the ascii version appearing after the encoded one).

comment:6 Changed 12 months ago by ceaess

RFC # 5987 isn't relevant here as it's about the server sending files to the client but not the inverse.

comment:7 Changed 11 months ago by ceaess

Sent a pull request to add another test and update the RFC#

comment:8 Changed 11 months ago by timgraham

  • Triage Stage changed from Accepted to Ready for checkin

comment:9 Changed 11 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In b42e5ca058178d67027bf66d37d00ade635b4c26:

Fixed #22971 -- Properly parsed RFC 2388 encoded headers

Thanks homm for the report, Cea Stapleton for patch improvements
and Ian Cordasco, Christian Schmitt and Tim Graham for the review.

Note: See TracTickets for help on using tickets.
Back to Top