Opened 16 years ago
Closed 10 years ago
#11111 closed Bug (wontfix)
Python 2.7 WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode
Reported by: | mgood | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 1.0 |
Severity: | Normal | Keywords: | |
Cc: | Graham.Dumpleton@…, Malcolm Tredinnick, cmawebsite@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The WSGI spec requires the standard environment values to be str
, not unicode
types, but the WSGIRequest object updates the environ (via self.META which is a reference), setting PATH_INFO and SCRIPT_NAME to unicode objects. These unicode values led to some issues with WebTest which checks to ensure that the environ only contains str
values.
Attachments (1)
Change History (14)
follow-up: 2 comment:1 by , 16 years ago
Cc: | added |
---|
comment:2 by , 16 years ago
Replying to grahamd:
FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.
Yes, but only by decoding as "latin-1" which means it's still 1 character per byte. Django could re-decode these values as utf-8, but it would not be valid to put them into the WSGI environ as such since they could contain code points above \uFF which can not be encoded as latin-1.
Here's a demonstration of the reported problem:
>>> from django.core.handlers import wsgi >>> path = '\xc3\xbc' >>> print path.decode('utf-8') ü >>> environ = {'REQUEST_METHOD': 'GET', 'PATH_INFO': path} >>> req = wsgi.WSGIRequest(environ) >>> environ['PATH_INFO'] u'\xfc' >>> path == environ['PATH_INFO'] __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False
comment:3 by , 16 years ago
Component: | Uncategorized → HTTP handling |
---|
comment:4 by , 16 years ago
Graham, you're more than qualified to promote this from an unreviewed ticket - at least to a Design Decision if you don't want to make the call ;)
comment:5 by , 16 years ago
I don't use Django or even know much about Django code internals, so I wouldn't like to be saying it is okay or not. In other words, I might be able to comment on WSGI and web hosting mechanisms, but far from being qualified to comment about Django itself. :-)
comment:6 by , 16 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
Ok, fair enough. I'll make the call that Django should be following wsgi spec regarding unicode vs string. Malcolm would probably know if there's some reason why it shouldn't, so I'm ccing him in for an opinion.
comment:7 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:8 by , 13 years ago
Easy pickings: | unset |
---|---|
UI/UX: | unset |
The WSGI spec says all strings should be of the str type and be encoded as ISO-8859-1(latin-1) or be MIME encoded according to RFC 2047. When dealing with the PATH_INFO and SCRIPT_NAME environ variables I think it's to safe to say we don't want to put MIME encoded data in them. Which leave us with the latin-1 encoding. We can always explicitly decode those variables into latin-1 to ensure we follow the spec. Doing so should also handle the case of being handed unicode data, we'll simple re-encode it with undefined results (which is also in the WSGI spec). I'll attach a patch that demonstrates this.
by , 13 years ago
Attachment: | ticket_11111.diff added |
---|
Diff for core/handlers/wsgi.py for unicode problem in wsgi handling
comment:9 by , 13 years ago
Has patch: | set |
---|
follow-up: 11 comment:10 by , 13 years ago
Should be fine leaving:
path_info = u'/'
That should always be the same as:
'/'.decode('latin-1')
anyway.
comment:11 by , 13 years ago
Replying to grahamd:
Should be fine leaving:
path_info = u'/'
That should always be the same as:
'/'.decode('latin-1')
anyway.
Yes, I realized that line probably didn't need to change, after I uploaded the patch.
comment:12 by , 12 years ago
Needs tests: | set |
---|
comment:13 by , 10 years ago
Cc: | added |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
Summary: | WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode → Python 2.7 WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode |
Seems to me if we're doing it right on Python 3, and it hasn't really been a problem, then there's no use changing the behavior on python2.7.
FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.