﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
18909	QueryString with non-ascii characters under Windows (Python 2.7) may be decoded inproperly	public@…	nobody	"Under Python 2.7, using cgi.parse_qsl would vary on return type according to the input arguments. Take the following query string as example:


{{{
>>> cgi.parse_qsl('q=%E4%BD%A0%E5%A5%BD')
[('q', '\xe4\xbd\xa0\xe5\xa5\xbd')]
>>> cgi.parse_qsl(u'q=%E4%BD%A0%E5%A5%BD')
[(u'q', u'\xe4\xbd\xa0\xe5\xa5\xbd')]
}}}


The url-encoded string, ""%E4%BD%A0%E5%A5%BD"", carries two Chinese characters. There's not much trouble for the time being though, but when it is used together with django.http.QueryDict, something bad happens.

{{{
    def __init__(self, query_string, mutable=False, encoding=None):
        super(QueryDict, self).__init__()
        if not encoding:
            encoding = settings.DEFAULT_CHARSET
        self.encoding = encoding
        #if (isinstance(query_string, unicode)):     # These two lines were added by me.
        #    query_string = query_string.encode('utf-8')
        if six.PY3:
            for key, value in parse_qsl(query_string or '',
                                        keep_blank_values=True,
                                        encoding=encoding):
                self.appendlist(key, value)
        else:
            for key, value in parse_qsl(query_string or '',
                                        keep_blank_values=True):
                self.appendlist(force_text(key, encoding, errors='replace'),
                                force_text(value, encoding, errors='replace'))
        self._mutable = mutable
}}}

Although key and values are intended to be translated to Unicode, they have no chance to make the effort. The value remains u'\xe4\xbd\xa0\xe5\xa5\xbd', which is certainly not a valid UTF-16 string. I can't say whether this bug would be fixed silently later in other place of Django. What's more, I just couldn't see the bug under Linux (UTF-8). But under Windows, it really exists.

Here I will upload my Django application along with this ticket. It is a really small application, and just displays everything client requested to server. You can start the application by manage.py runserver, and hit ""http://localhost:8000/?q=%E4%BD%A0%E5%A5%BD"" to see whether the bug occurs or not.

Note: I discovered this under Windows XP SP3 (Simplified Chinese) with system encoding = gbk (Though the django is set to utf-8)."	Bug	closed	Core (URLs)	1.4	Normal	invalid			Unreviewed	0	0	0	0	0	0
