Opened 7 years ago

Closed 5 years ago

#28123 closed Bug (wontfix)

django.utils.html.smart_urlquote() is incorrectly parsing the query string

Reported by: Denis Pechenev Owned by: nobody
Component: Utilities Version: 1.10
Severity: Normal Keywords: smart_urlquote python2
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Denis Pechenev)

Query string like 'search_text=%D0%B4%D0%B6%D0%B8%D0%BD%D1%81%D0%BE%D0%B2%D1%8B%D0%B5+%D0%BA%D1%83%D1%80%D1%82%D0%BA%D0%B8' is already encoded. But smart_urlquote() encodes it again because of incorrect parsing in parse_qsl(). Value should be encoded with ASCII before parsing.

So there should be something like:

query_parts = [(unquote(force_str(q[0])), unquote(force_str(q[1])))
                       for q in parse_qsl(query.encode('ascii'), keep_blank_values=True)]

https://github.com/django/django/blob/master/django/utils/html.py#L216

Change History (11)

comment:1 by Denis Pechenev, 7 years ago

Description: modified (diff)

comment:2 by Denis Pechenev, 7 years ago

Type: UncategorizedBug

comment:5 by Igor Gumenyuk, 7 years ago

The is very annoying bug in smart_urlquote since urlize (for example) calls it internally and passes unicode as an argument.
And there is no way to wrap argument in str() (sic!) or at least .encode('ascii') without forking urlize

Here's an example to show the issue:

from django.utils.html import smart_urlquote, urlize

s1 = 'http://example.com/?search_text=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82' #'http://example.com/?search_text=привет
s2 = u'http://example.com/?search_text=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'

In [15]: smart_urlquote(s1)
Out[15]: u'http://example.com/?search_text=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82'

In [16]: smart_urlquote(s2)
Out[16]: u'http://example.com/women?search_text=%C3%90%C2%BF%C3%91%C2%80%C3%90%C2%B8%C3%90%C2%B2%C3%90%C2%B5%C3%91%C2%82'

With unicode string resulting URL gets double urlquoted.

comment:6 by Tim Graham, 7 years ago

As far as I can tell, Python 3 isn't affected. Since the master branch doesn't support Python 2 and I don't think this issue qualifies for a backport based on our supported versions policy, I think we can close it as wontfix.

comment:7 by Tim Graham, 7 years ago

Component: UncategorizedUtilities
Resolution: wontfix
Status: newclosed

comment:8 by Igor Gumenyuk, 7 years ago

Resolution: wontfix
Status: closednew

While python3 is great option, I don't think you're right. This bug has also exists in 1.10, 1.11 (LTS) versions which support python2.7.
Moreover it qualifies as data loss/corruption bug.

comment:9 by Tim Graham, 7 years ago

How does it cause data loss?

in reply to:  9 comment:10 by Denis Pechenev, 7 years ago

Replying to Tim Graham:

How does it cause data loss?

The bug doesn't cause data loss, but it causes data corruption after applying urlize() filter for strings which contain urls. So urls become broken and there is no way to fix that without forking urlize() or updating application to support python3 or applying some custom code. We have Django 1.10.7 and python 2.7. According the Documentation those versions are compatible.

Could you explain how to fix the bug without updating python to third version? Thank you in advance.

comment:11 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted

Since this is a regression in Django 1.8 due to 4b8a1d2c0d1a8c5107f3aef01597db78d2a2a5ce, we could accept a patch for Django 1.11.

comment:12 by Claude Paroz, 6 years ago

Keywords: python2 added

comment:13 by Tim Graham, 5 years ago

Resolution: wontfix
Status: newclosed

It seems a patch for this issue isn't forthcoming.

Note: See TracTickets for help on using tickets.
Back to Top