Opened 12 months ago

Closed 6 months ago

Last modified 6 months ago

#22267 closed Bug (fixed)

django.utils.html.smart_urlquote() is incorrectly unquoting the url

Reported by: meenzam@… Owned by: ienzam
Component: Utilities Version: master
Severity: Normal Keywords:
Cc: meenzam@…, kevin-brown Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


When we give django.utils.html.smart_urlquote() an url with an already embedded query as a query paramater, it is unquoting it which it should not do.

For example,
which has the query parameters {'q': ['', 'django']}

where it should have returned the exact url back
which has the query parameters {'q': ['']}

I have written a test here -

        # Ensure that embedded quoted url does not get unquoted

Change History (26)

comment:1 Changed 12 months ago by erikr

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.5 to master

It would seem the purpose of its current implementation is that a URL which is already quoted is not quoted again, but an unquoted URL will be quoted. The context in which this is applied, is django.utils.html.urlize. This was added in #9655 (, although at that time with a little more code than it's current version.

The characters you are expecting to be escaped are currently marked as safe. Otherwise, even would already have the URL encoding applied to it. At first sight, a solution seems to be to make smart_urlquote even smarter, and only consider the currently listed characters safe if they are not within a URL parameter themselves. I agree that this is a bug after all.

comment:3 Changed 12 months ago by meenzam@…

  • Cc meenzam@… added
  • Has patch set

The simple fix will be to unquote and quote only the path of the url, I have

This passes the current unit tests.

comment:4 Changed 12 months ago by erikr

  • Patch needs improvement set

In that patch however, the conditions are altered: quoting is not performed if any exception is raised and caught. This does not appear to be part of the tests, so they still all succeed.

comment:5 Changed 12 months ago by meenzam@…

  • Has patch unset

Hmm, yes the commit is wrong, urlize() will not always escape.

Can you provide the intended behaviour of urlize()? It will then be easier to fix when we have a set goal. Currently it is not totally clear what is the correct behaviour of urlize() and smart_urlquote().

comment:6 Changed 12 months ago by erikr

The formal description is whatever is documented. However, developers may depend on other details of its behaviour, which we should not change if it is not necessary.

comment:7 Changed 12 months ago by erikr

By which I don't mean that this particular issue should not be fixed anymore, in case I was ambiguous - it is definitely a bug.

comment:8 Changed 12 months ago by meenzam@…

Then I will suggest to return None from smart_urlquote() in case of any exception, then urlize() will not think the string as an url and will show text only.
Also not only path, but also query parameter and fragment should be quoted (because they may contain ").

For query string, we may first parse_qs() and then individually encode each key value pairs and remake the query string.
For fragment, We can do the same as path (unquoting and quoting again). Not perfect solution though because some sites store query parameters in fragments.

comment:10 Changed 12 months ago by erikr

  • Component changed from Template system to Utilities

That commit is built upon your previous patch, so it's very hard for me to review. At first sight, I don't think that inline function is really necessary, given how simple it is. But the rest I can't really review in this way. Also, it would be good to have some tests for how urlize handles the invalid URLs for which you now want to return None, as that behaviour should not change.

comment:11 Changed 12 months ago by erikr

You might want to have a look at #22223 and regarding which characters should be safe for which segments of the URL.

comment:12 Changed 12 months ago by ienzam

Read the bug and ticket.
From RFC 2396 (, valid characters for different parts of the url are following:
path => Alphanumeric and /;-_.!~*'():@&=+$,
query and fragment => Alphanumeric and /;-_.!~*'():@&=+$,? (only differs from path with an extra ?)

Can we not unquote and quote again because it can introduce some corner cases.
We can check if the url is already quoted or not (which was done till v1.4).

comment:13 Changed 11 months ago by ienzam

  • Has patch set
  • Needs tests set
  • Owner changed from nobody to ienzam
  • Status changed from new to assigned

Can you look at my patch? Merged all the changes into this one -

comment:14 Changed 10 months ago by meenzam@…

Any update?

comment:15 Changed 9 months ago by bendavis78

I'm just curious, is there a precedent for returning None when there's an error? Normally if there's an exception, it should bubble up, or a more specific one should be raised.

comment:16 Changed 8 months ago by ienzam

If None is returned then
"if url:" will be false and the invalid url will be treated as text.

comment:17 Changed 8 months ago by claudep

  • Needs tests unset
  • Patch needs improvement unset

I have chosen a slightly different approach, keeping the unquote/quote process but on the separate parts as ienzam proposed.

comment:18 Changed 7 months ago by timo

  • Patch needs improvement set

PR has some test failures.

comment:19 Changed 7 months ago by claudep

  • Patch needs improvement unset

comment:20 Changed 6 months ago by timgraham

  • Triage Stage changed from Accepted to Ready for checkin

@meenzam, perhaps you'd like to review this as well before we merge it.

comment:21 Changed 6 months ago by ienzam

comment:22 Changed 6 months ago by ienzam

Thank you for doing this. This patch fixes all of the problems I was facing.

One concern is though about the url:ä/
The last part of the url ä implies that this url is not quoted so the full url should be quoted -
Currently it becomes:
But maybe this is not a real life scenario (hopefully).

comment:23 Changed 6 months ago by claudep

Interesting case, but I think this should be a separate issue. This ticket is fixing the query string problem.

Your example might be solved by short-circuiting the unquote part if the URL contains any non-safe character.

comment:24 Changed 6 months ago by kevin-brown

  • Cc kevin-brown added

comment:25 Changed 6 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 4b8a1d2c0d1a8c5107f3aef01597db78d2a2a5ce:

Fixed #22267 -- Fixed unquote/quote in smart_urlquote

Thanks Md. Enzam Hossain for the report and initial patch, and
Tim Graham for the review.

comment:26 Changed 6 months ago by Claude Paroz <claude@…>

In b9d9287f59eb5c33dd8bc81179b4cf197fd54456:

Fixed urlize after smart_urlquote rewrite

Refs #22267.

Note: See TracTickets for help on using tickets.
Back to Top