QueryString with non-ascii characters under Windows (Python 2.7) may be decoded inproperly
|Reported by:||Owned by:||nobody|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Under Python 2.7, using cgi.parse_qsl would vary on return type according to the input arguments. Take the following query string as example:
>>> cgi.parse_qsl('q=%E4%BD%A0%E5%A5%BD') [('q', '\xe4\xbd\xa0\xe5\xa5\xbd')] >>> cgi.parse_qsl(u'q=%E4%BD%A0%E5%A5%BD') [(u'q', u'\xe4\xbd\xa0\xe5\xa5\xbd')]
The url-encoded string, "%E4%BD%A0%E5%A5%BD", carries two Chinese characters. There's not much trouble for the time being though, but when it is used together with django.http.QueryDict, something bad happens.
def __init__(self, query_string, mutable=False, encoding=None): super(QueryDict, self).__init__() if not encoding: encoding = settings.DEFAULT_CHARSET self.encoding = encoding #if (isinstance(query_string, unicode)): # These two lines were added by me. # query_string = query_string.encode('utf-8') if six.PY3: for key, value in parse_qsl(query_string or '', keep_blank_values=True, encoding=encoding): self.appendlist(key, value) else: for key, value in parse_qsl(query_string or '', keep_blank_values=True): self.appendlist(force_text(key, encoding, errors='replace'), force_text(value, encoding, errors='replace')) self._mutable = mutable
Although key and values are intended to be translated to Unicode, they have no chance to make the effort. The value remains u'\xe4\xbd\xa0\xe5\xa5\xbd', which is certainly not a valid UTF-16 string. I can't say whether this bug would be fixed silently later in other place of Django. What's more, I just couldn't see the bug under Linux (UTF-8). But under Windows, it really exists.
Here I will upload my Django application along with this ticket. It is a really small application, and just displays everything client requested to server. You can start the application by manage.py runserver, and hit "http://localhost:8000/?q=%E4%BD%A0%E5%A5%BD" to see whether the bug occurs or not.
Note: I discovered this under Windows XP SP3 (Simplified Chinese) with system encoding = gbk (Though the django is set to utf-8).