Opened 14 months ago

Last modified 14 months ago

#28123 new Bug

django.utils.html.smart_urlquote() is incorrectly parsing the query string

Reported by: Denis Pechenev Owned by: nobody
Component: Utilities Version: 1.10
Severity: Normal Keywords: smart_urlquote
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Denis Pechenev)

Query string like 'search_text=%D0%B4%D0%B6%D0%B8%D0%BD%D1%81%D0%BE%D0%B2%D1%8B%D0%B5+%D0%BA%D1%83%D1%80%D1%82%D0%BA%D0%B8' is already encoded. But smart_urlquote() encodes it again because of incorrect parsing in parse_qsl(). Value should be encoded with ASCII before parsing.

So there should be something like:

query_parts = [(unquote(force_str(q[0])), unquote(force_str(q[1])))
                       for q in parse_qsl(query.encode('ascii'), keep_blank_values=True)]

https://github.com/django/django/blob/master/django/utils/html.py#L216

Change History (9)

comment:1 Changed 14 months ago by Denis Pechenev

Description: modified (diff)

comment:2 Changed 14 months ago by Denis Pechenev

Type: UncategorizedBug

comment:5 Changed 14 months ago by Igor Gumenyuk

The is very annoying bug in smart_urlquote since urlize (for example) calls it internally and passes unicode as an argument.
And there is no way to wrap argument in str() (sic!) or at least .encode('ascii') without forking urlize

Here's an example to show the issue:

from django.utils.html import smart_urlquote, urlize

s1 = 'http://example.com/?search_text=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82' #'http://example.com/?search_text=привет
s2 = u'http://example.com/?search_text=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'

In [15]: smart_urlquote(s1)
Out[15]: u'http://example.com/?search_text=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'

In [16]: smart_urlquote(s2)
Out[16]: u'http://example.com/women?search_text=%C3%90%C2%BF%C3%91%C2%80%C3%90%C2%B8%C3%90%C2%B2%C3%90%C2%B5%C3%91%C2%82'

With unicode string resulting URL gets double urlquoted.

comment:6 Changed 14 months ago by Tim Graham

As far as I can tell, Python 3 isn't affected. Since the master branch doesn't support Python 2 and I don't think this issue qualifies for a backport based on our supported versions policy, I think we can close it as wontfix.

comment:7 Changed 14 months ago by Tim Graham

Component: UncategorizedUtilities
Resolution: wontfix
Status: newclosed

comment:8 Changed 14 months ago by Igor Gumenyuk

Resolution: wontfix
Status: closednew

While python3 is great option, I don't think you're right. This bug has also exists in 1.10, 1.11 (LTS) versions which support python2.7.
Moreover it qualifies as data loss/corruption bug.

comment:9 Changed 14 months ago by Tim Graham

How does it cause data loss?

comment:10 in reply to:  9 Changed 14 months ago by Denis Pechenev

Replying to Tim Graham:

How does it cause data loss?

The bug doesn't cause data loss, but it causes data corruption after applying urlize() filter for strings which contain urls. So urls become broken and there is no way to fix that without forking urlize() or updating application to support python3 or applying some custom code. We have Django 1.10.7 and python 2.7. According the Documentation those versions are compatible.

Could you explain how to fix the bug without updating python to third version? Thank you in advance.

comment:11 Changed 14 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

Since this is a regression in Django 1.8 due to 4b8a1d2c0d1a8c5107f3aef01597db78d2a2a5ce, we could accept a patch for Django 1.11.

Note: See TracTickets for help on using tickets.
Back to Top