Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#20809 closed Bug (worksforme)

Python 2.7.3 Django 1.5.1 QueryDict.urlencode returns unicode

Reported by: stargrave@… Owned by: nobody
Component: HTTP handling Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

QueryDict's urlencode methods always returns unicode strings:

>>> from django.http.request import QueryDict
>>> QueryDict('foo=bar').urlencode()
u'foo=bar'

I suppose it is not expected behaviour at all. Django 1.4.x returns str-s.

After looking at urlencode's source code, force_bytes should translate all unicode-s to str-s. But the problem lies in the very beginning of that file in from __future__ import unicode_literals that forces ""-s to be unicode-s. So anyway '&'.join() will result in u'&'.join(), as '%s=%s' one too.

if safe:
    safe = force_bytes(safe, self.encoding)
    encode = lambda k, v: '%s=%s' % ((quote(k, safe), quote(v, safe)))
else:
    encode = lambda k, v: urlencode({k: v})
for k, list_ in self.lists():
    k = force_bytes(k, self.encoding)
    output.extend([encode(k, force_bytes(v, self.encoding))
                   for v in list_])
return '&'.join(output)

Attachments (1)

urlencode_forced_bytes.diff (518 bytes) - added by stargrave@… 2 years ago.
Forced bytes returning for urlencode

Download all attachments as: .zip

Change History (5)

Changed 2 years ago by stargrave@…

Forced bytes returning for urlencode

comment:1 Changed 2 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I didn't dig hard in this issue, but could you please give us a use case when returning a unicode string is problematic?

comment:2 Changed 2 years ago by stargrave@…

  • one of our function takes urlencoded (for example from GET query) string as an input everytime. Sometimes we have to feed it ourselves with prepared querystring:
    >>> from django.http import QueryDict
    >>> q = QueryDict({}, mutable=True)
    >>> q["hello"] = "привет"
    >>> ENCODED = q.urlencode()
    >>> ENCODED
    u'hello=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'
    
  • then we parse this input in another function:
    >>> from urllib import unquote_plus
    >>> VALUE = unquote_plus(ENCODED.split("&")[0].split("=")[1])
    >>> VALUE
    u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
    
  • and if we apply smart_unicode to it, then will receive the same string again, but suppose to retrieve decoded one:
    >>> from django.utils.encoding import smart_unicode
    >>> smart_unicode(VALUE)
    u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
    >>> smart_unicode('\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82')
    u'\u043f\u0440\u0438\u0432\u0435\u0442'
    
  • the same behaviour of parsing can be achieved using urlparse.parse_qs(l) -- they return unicode strings too, that can not be smart_unicoded:
    >>> from urlparse import parse_qs
    >>> parse_qs(ENCODED)
    {u'hello': [u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82']}
    

urlencode() returns urlencoded-string, that is by definition is ASCII-like, so can not consist of multibyte characters.

comment:3 Changed 2 years ago by claudep

  • Component changed from Core (Other) to HTTP handling
  • Resolution set to worksforme
  • Status changed from new to closed

Thanks for explaining your use case, this is valuable.

I think the error in the above example is to use Python standard urllib which is not unicode friendly (on Python 2). That's why Django is providing wrappers for most of these functions in django.utils.http (https://docs.djangoproject.com/en/dev/ref/utils/#module-django.utils.http):

>>> from django.utils.http import urlunquote_plus
>>> urlunquote_plus(ENCODED)
u'hello=\u043f\u0440\u0438\u0432\u0435\u0442'

In Python 3, we should be able to use the standard library again:

>>> q = QueryDict({}, mutable=True)
>>> q["hello"] = "привет"
>>> q.urlencode()
'hello=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'
>>> from urllib import parse
>>> parse.unquote_plus(q.urlencode())
'hello=привет'

comment:4 Changed 2 years ago by stargrave@…

Ah, I see. Thank you very much for the pointer! So we will use Django's built-in wrappers.

Note: See TracTickets for help on using tickets.
Back to Top