Code

Opened 9 months ago

Closed 9 months ago

Last modified 9 months ago

#20809 closed Bug (worksforme)

Python 2.7.3 Django 1.5.1 QueryDict.urlencode returns unicode

Reported by: stargrave@… Owned by: nobody
Component: HTTP handling Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

QueryDict's urlencode methods always returns unicode strings:

>>> from django.http.request import QueryDict
>>> QueryDict('foo=bar').urlencode()
u'foo=bar'

I suppose it is not expected behaviour at all. Django 1.4.x returns str-s.

After looking at urlencode's source code, force_bytes should translate all unicode-s to str-s. But the problem lies in the very beginning of that file in from __future__ import unicode_literals that forces ""-s to be unicode-s. So anyway '&'.join() will result in u'&'.join(), as '%s=%s' one too.

if safe:
    safe = force_bytes(safe, self.encoding)
    encode = lambda k, v: '%s=%s' % ((quote(k, safe), quote(v, safe)))
else:
    encode = lambda k, v: urlencode({k: v})
for k, list_ in self.lists():
    k = force_bytes(k, self.encoding)
    output.extend([encode(k, force_bytes(v, self.encoding))
                   for v in list_])
return '&'.join(output)

Attachments (1)

urlencode_forced_bytes.diff (518 bytes) - added by stargrave@… 9 months ago.
Forced bytes returning for urlencode

Download all attachments as: .zip

Change History (5)

Changed 9 months ago by stargrave@…

Forced bytes returning for urlencode

comment:1 Changed 9 months ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I didn't dig hard in this issue, but could you please give us a use case when returning a unicode string is problematic?

comment:2 Changed 9 months ago by stargrave@…

  • one of our function takes urlencoded (for example from GET query) string as an input everytime. Sometimes we have to feed it ourselves with prepared querystring:
    >>> from django.http import QueryDict
    >>> q = QueryDict({}, mutable=True)
    >>> q["hello"] = "привет"
    >>> ENCODED = q.urlencode()
    >>> ENCODED
    u'hello=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'
    
  • then we parse this input in another function:
    >>> from urllib import unquote_plus
    >>> VALUE = unquote_plus(ENCODED.split("&")[0].split("=")[1])
    >>> VALUE
    u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
    
  • and if we apply smart_unicode to it, then will receive the same string again, but suppose to retrieve decoded one:
    >>> from django.utils.encoding import smart_unicode
    >>> smart_unicode(VALUE)
    u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
    >>> smart_unicode('\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82')
    u'\u043f\u0440\u0438\u0432\u0435\u0442'
    
  • the same behaviour of parsing can be achieved using urlparse.parse_qs(l) -- they return unicode strings too, that can not be smart_unicoded:
    >>> from urlparse import parse_qs
    >>> parse_qs(ENCODED)
    {u'hello': [u'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82']}
    

urlencode() returns urlencoded-string, that is by definition is ASCII-like, so can not consist of multibyte characters.

comment:3 Changed 9 months ago by claudep

  • Component changed from Core (Other) to HTTP handling
  • Resolution set to worksforme
  • Status changed from new to closed

Thanks for explaining your use case, this is valuable.

I think the error in the above example is to use Python standard urllib which is not unicode friendly (on Python 2). That's why Django is providing wrappers for most of these functions in django.utils.http (https://docs.djangoproject.com/en/dev/ref/utils/#module-django.utils.http):

>>> from django.utils.http import urlunquote_plus
>>> urlunquote_plus(ENCODED)
u'hello=\u043f\u0440\u0438\u0432\u0435\u0442'

In Python 3, we should be able to use the standard library again:

>>> q = QueryDict({}, mutable=True)
>>> q["hello"] = "привет"
>>> q.urlencode()
'hello=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'
>>> from urllib import parse
>>> parse.unquote_plus(q.urlencode())
'hello=привет'

comment:4 Changed 9 months ago by stargrave@…

Ah, I see. Thank you very much for the pointer! So we will use Django's built-in wrappers.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.