#20809 closed Bug (worksforme)
Python 2.7.3 Django 1.5.1 QueryDict.urlencode returns unicode
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | HTTP handling | Version: | 1.5 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
QueryDict's urlencode methods always returns unicode strings:
>>> from django.http.request import QueryDict >>> QueryDict('foo=bar').urlencode() u'foo=bar'
I suppose it is not expected behaviour at all. Django 1.4.x returns str-s.
After looking at urlencode's source code, force_bytes should translate all unicode-s to str-s. But the problem lies in the very beginning of that file in from __future__ import unicode_literals
that forces ""-s to be unicode-s. So anyway '&'.join() will result in u'&'.join(), as '%s=%s' one too.
if safe: safe = force_bytes(safe, self.encoding) encode = lambda k, v: '%s=%s' % ((quote(k, safe), quote(v, safe))) else: encode = lambda k, v: urlencode({k: v}) for k, list_ in self.lists(): k = force_bytes(k, self.encoding) output.extend([encode(k, force_bytes(v, self.encoding)) for v in list_]) return '&'.join(output)
Attachments (1)
Change History (5)
by , 12 years ago
Attachment: | urlencode_forced_bytes.diff added |
---|
comment:1 by , 12 years ago
I didn't dig hard in this issue, but could you please give us a use case when returning a unicode string is problematic?
comment:2 by , 12 years ago
- one of our function takes urlencoded (for example from GET query) string as an input everytime. Sometimes we have to feed it ourselves with prepared querystring:
>>> from django.http import QueryDict >>> q = QueryDict({}, mutable=True) >>> q["hello"] = "привет" >>> ENCODED = q.urlencode() >>> ENCODED u'hello=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'
- then we parse this input in another function:
>>> from urllib import unquote_plus >>> VALUE = unquote_plus(ENCODED.split("&")[0].split("=")[1]) >>> VALUE u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
- and if we apply smart_unicode to it, then will receive the same string again, but suppose to retrieve decoded one:
>>> from django.utils.encoding import smart_unicode >>> smart_unicode(VALUE) u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82' >>> smart_unicode('\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82') u'\u043f\u0440\u0438\u0432\u0435\u0442'
- the same behaviour of parsing can be achieved using urlparse.parse_qs(l) -- they return unicode strings too, that can not be smart_unicoded:
>>> from urlparse import parse_qs >>> parse_qs(ENCODED) {u'hello': [u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82']}
urlencode() returns urlencoded-string, that is by definition is ASCII-like, so can not consist of multibyte characters.
comment:3 by , 12 years ago
Component: | Core (Other) → HTTP handling |
---|---|
Resolution: | → worksforme |
Status: | new → closed |
Thanks for explaining your use case, this is valuable.
I think the error in the above example is to use Python standard urllib
which is not unicode friendly (on Python 2). That's why Django is providing wrappers for most of these functions in django.utils.http
(https://docs.djangoproject.com/en/dev/ref/utils/#module-django.utils.http):
>>> from django.utils.http import urlunquote_plus >>> urlunquote_plus(ENCODED) u'hello=\u043f\u0440\u0438\u0432\u0435\u0442'
In Python 3, we should be able to use the standard library again:
>>> q = QueryDict({}, mutable=True) >>> q["hello"] = "привет" >>> q.urlencode() 'hello=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82' >>> from urllib import parse >>> parse.unquote_plus(q.urlencode()) 'hello=привет'
comment:4 by , 12 years ago
Ah, I see. Thank you very much for the pointer! So we will use Django's built-in wrappers.
Forced bytes returning for urlencode