Opened 6 years ago

Closed 8 months ago

#11111 closed Bug (wontfix)

Python 2.7 WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode

Reported by: mgood Owned by: nobody
Component: HTTP handling Version: 1.0
Severity: Normal Keywords:
Cc: Graham.Dumpleton@…, mtredinnick, cmawebsite@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The WSGI spec requires the standard environment values to be str, not unicode types, but the WSGIRequest object updates the environ (via self.META which is a reference), setting PATH_INFO and SCRIPT_NAME to unicode objects. These unicode values led to some issues with WebTest which checks to ensure that the environ only contains str values.

Attachments (1)

ticket_11111.diff (1.1 KB) - added by Jeff Buttars <jeffbuttars@…> 3 years ago.
Diff for core/handlers/wsgi.py for unicode problem in wsgi handling

Download all attachments as: .zip

Change History (14)

comment:1 follow-up: Changed 6 years ago by grahamd

  • Cc Graham.Dumpleton@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.

comment:2 in reply to: ↑ 1 Changed 6 years ago by mgood

Replying to grahamd:

FWIW, it looks like that WSGI for Python 3.0, if specification is ever actually updated, will have WSGI environment variables be unicode, ie., Python 3.0 strings and not byte strings, which would have been the proper equivalent to Python 2.X strings.

Yes, but only by decoding as "latin-1" which means it's still 1 character per byte. Django could re-decode these values as utf-8, but it would not be valid to put them into the WSGI environ as such since they could contain code points above \uFF which can not be encoded as latin-1.

Here's a demonstration of the reported problem:

>>> from django.core.handlers import wsgi
>>> path = '\xc3\xbc'
>>> print path.decode('utf-8')
ü
>>> environ = {'REQUEST_METHOD': 'GET', 'PATH_INFO': path}
>>> req = wsgi.WSGIRequest(environ)
>>> environ['PATH_INFO']
u'\xfc'
>>> path == environ['PATH_INFO']
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
Last edited 4 years ago by ramiro (previous) (diff)

comment:3 Changed 6 years ago by anonymous

  • Component changed from Uncategorized to HTTP handling

comment:4 Changed 6 years ago by SmileyChris

Graham, you're more than qualified to promote this from an unreviewed ticket - at least to a Design Decision if you don't want to make the call ;)

comment:5 Changed 6 years ago by grahamd

I don't use Django or even know much about Django code internals, so I wouldn't like to be saying it is okay or not. In other words, I might be able to comment on WSGI and web hosting mechanisms, but far from being qualified to comment about Django itself. :-)

comment:6 Changed 6 years ago by SmileyChris

  • Cc mtredinnick added
  • Triage Stage changed from Unreviewed to Accepted

Ok, fair enough. I'll make the call that Django should be following wsgi spec regarding unicode vs string. Malcolm would probably know if there's some reason why it shouldn't, so I'm ccing him in for an opinion.

comment:7 Changed 4 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:8 Changed 3 years ago by Jeff Buttars <jeffbuttars@…>

  • Easy pickings unset
  • UI/UX unset

The WSGI spec says all strings should be of the str type and be encoded as ISO-8859-1(latin-1) or be MIME encoded according to RFC 2047. When dealing with the PATH_INFO and SCRIPT_NAME environ variables I think it's to safe to say we don't want to put MIME encoded data in them. Which leave us with the latin-1 encoding. We can always explicitly decode those variables into latin-1 to ensure we follow the spec. Doing so should also handle the case of being handed unicode data, we'll simple re-encode it with undefined results (which is also in the WSGI spec). I'll attach a patch that demonstrates this.

Changed 3 years ago by Jeff Buttars <jeffbuttars@…>

Diff for core/handlers/wsgi.py for unicode problem in wsgi handling

comment:9 Changed 3 years ago by Jeff Buttars <jeffbuttars@…>

  • Has patch set

comment:10 follow-up: Changed 3 years ago by grahamd

Should be fine leaving:

path_info = u'/'

That should always be the same as:

'/'.decode('latin-1')

anyway.

comment:11 in reply to: ↑ 10 Changed 3 years ago by Jeff Buttars <jeffbuttars@…>

Replying to grahamd:

Should be fine leaving:

path_info = u'/'

That should always be the same as:

'/'.decode('latin-1')

anyway.

Yes, I realized that line probably didn't need to change, after I uploaded the patch.

comment:12 Changed 3 years ago by claudep

  • Needs tests set

comment:13 Changed 8 months ago by CollinAnderson

  • Cc cmawebsite@… added
  • Resolution set to wontfix
  • Status changed from new to closed
  • Summary changed from WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode to Python 2.7 WSGIRequest should not make PATH_INFO and SCRIPT_NAME unicode

Seems to me if we're doing it right on Python 3, and it hasn't really been a problem, then there's no use changing the behavior on python2.7.

Note: See TracTickets for help on using tickets.
Back to Top