Opened 15 years ago

Closed 10 years ago

#11111 closed Bug (wontfix)

Python 2.7 WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode

Reported by: mgood Owned by: nobody
Component: HTTP handling Version: 1.0
Severity: Normal Keywords:
Cc: Graham.Dumpleton@…, Malcolm Tredinnick, cmawebsite@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The WSGI spec requires the standard environment values to be str, not unicode types, but the WSGIRequest object updates the environ (via self.META which is a reference), setting PATH_INFO and SCRIPT_NAME to unicode objects. These unicode values led to some issues with WebTest which checks to ensure that the environ only contains str values.

Attachments (1)

ticket_11111.diff (1.1 KB ) - added by Jeff Buttars <jeffbuttars@…> 13 years ago.
Diff for core/handlers/wsgi.py for unicode problem in wsgi handling

Download all attachments as: .zip

Change History (14)

comment:1 by Graham Dumpleton, 15 years ago

Cc: Graham.Dumpleton@… added

FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.

in reply to:  1 comment:2 by mgood, 15 years ago

Replying to grahamd:

FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.

Yes, but only by decoding as "latin-1" which means it's still 1 character per byte. Django could re-decode these values as utf-8, but it would not be valid to put them into the WSGI environ as such since they could contain code points above \uFF which can not be encoded as latin-1.

Here's a demonstration of the reported problem:

>>> from django.core.handlers import wsgi
>>> path = '\xc3\xbc'
>>> print path.decode('utf-8')
ü
>>> environ = {'REQUEST_METHOD': 'GET', 'PATH_INFO': path}
>>> req = wsgi.WSGIRequest(environ)
>>> environ['PATH_INFO']
u'\xfc'
>>> path == environ['PATH_INFO']
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
Last edited 14 years ago by Ramiro Morales (previous) (diff)

comment:3 by anonymous, 15 years ago

Component: UncategorizedHTTP handling

comment:4 by Chris Beaven, 15 years ago

Graham, you're more than qualified to promote this from an unreviewed ticket - at least to a Design Decision if you don't want to make the call ;)

comment:5 by Graham Dumpleton, 15 years ago

I don't use Django or even know much about Django code internals, so I wouldn't like to be saying it is okay or not. In other words, I might be able to comment on WSGI and web hosting mechanisms, but far from being qualified to comment about Django itself. :-)

comment:6 by Chris Beaven, 15 years ago

Cc: Malcolm Tredinnick added
Triage Stage: UnreviewedAccepted

Ok, fair enough. I'll make the call that Django should be following wsgi spec regarding unicode vs string. Malcolm would probably know if there's some reason why it shouldn't, so I'm ccing him in for an opinion.

comment:7 by Julien Phalip, 13 years ago

Severity: Normal
Type: Bug

comment:8 by Jeff Buttars <jeffbuttars@…>, 13 years ago

Easy pickings: unset
UI/UX: unset

The WSGI spec says all strings should be of the str type and be encoded as ISO-8859-1(latin-1) or be MIME encoded according to RFC 2047. When dealing with the PATH_INFO and SCRIPT_NAME environ variables I think it's to safe to say we don't want to put MIME encoded data in them. Which leave us with the latin-1 encoding. We can always explicitly decode those variables into latin-1 to ensure we follow the spec. Doing so should also handle the case of being handed unicode data, we'll simple re-encode it with undefined results (which is also in the WSGI spec). I'll attach a patch that demonstrates this.

by Jeff Buttars <jeffbuttars@…>, 13 years ago

Attachment: ticket_11111.diff added

Diff for core/handlers/wsgi.py for unicode problem in wsgi handling

comment:9 by Jeff Buttars <jeffbuttars@…>, 13 years ago

Has patch: set

comment:10 by Graham Dumpleton, 13 years ago

Should be fine leaving:

path_info = u'/'

That should always be the same as:

'/'.decode('latin-1')

anyway.

in reply to:  10 comment:11 by Jeff Buttars <jeffbuttars@…>, 13 years ago

Replying to grahamd:

Should be fine leaving:

path_info = u'/'

That should always be the same as:

'/'.decode('latin-1')

anyway.

Yes, I realized that line probably didn't need to change, after I uploaded the patch.

comment:12 by Claude Paroz, 12 years ago

Needs tests: set

comment:13 by Collin Anderson, 10 years ago

Cc: cmawebsite@… added
Resolution: wontfix
Status: newclosed
Summary: WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicodePython 2.7 WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode

Seems to me if we're doing it right on Python 3, and it hasn't really been a problem, then there's no use changing the behavior on python2.7.

Note: See TracTickets for help on using tickets.
Back to Top