﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
4594	django.core.urlresolvers.reverse_helper doesn't unescape characters that are escaped in URL regexes	Todd O'Bryan <toddobryan@…>	Jacob	"Any backslash-escaped character in a URL doesn't get unescaped by the `{% url ...%}` tag (and presumably by other methods of view reversing). For example,
{{{
urlpatterns = patterns('',
    (r'^prices/less_than_\$(?P<price>\d+)/$', 'cost_less'),                   
    (r'^headlines/(?P<year>/d+)\.(?P<month>\d+)\.(?P<day>\d+)/$', 'daily_headlines'),
    (r'^priests/(?P<name>\w+)\+/$', 'priest_homepage'),
    (r'^windows_path/(?P<drive_name>[A-Z]):\\\\(?P<path>.+)', 'windows_path'),
)
}}}
The dollar sign, dot, plus, and backslash in each of the URL patterns match a single character, but don't get converted back to that character by the reverse function.

It seems that there aren't that many of these. Any escape sequence that doesn't match a constant string (i.e. something like `\s` or `\d` or `\w`) had better be part of a pattern so that it can be replaced with the right string to get the URL you're expecting. That leaves the following, I think.
|| Pattern || Replacement ||
|| `\A`    || `''` (equivalent to `^`)||
|| `\Z` || `''` (equivalent to `$`)||
|| `\b` and `\B` || `''` (these ''shouldn't'' appear in urls, but can only match the empty string)||
|| `\.`, `\^`, `\$`, `\*`, `\+`, `\?`, `\(`, `\)`, `\{`, `\}`, `\[`, `\]`, and `\\` || the same character, without a backslash|| 
As a first stab, I'd just get rid of `\A`, `\Z`, `\b`, and `\B`, just as the current code does for `^` and `$`. This is actually kind of complicated, because you have to make sure that the `\` in front isn't part of a pair of backslashes. In other words, `\\b` should become `\b`, but `\\\b` should just become `\`. Also, the current code removes all `^` and `$`. That's wrong if they're preceded by a backslash and meant to be the actual character.

There are some gotchas--when you insert values, you have to escape characters that you'll be unescaping later. I do check for character classes that don't map to a single definite character (e.g., `\d` and `\w`) and raise an exception if they're still there when we finish (since the reverse lookup can't work). I don't check for things like `[a-z]` or `a{2,3}`, but that will almost guarantee the reversing fails, too.

Note that #2977 also addresses this problem, but it does other things, too. Also I think that code may not handle some corner cases correctly. Meanwhile, my patch may be overly agressive and might include handling for characters that will never appear in a URL.

Give SmileyChris and me some time to work this out."		closed	Uncategorized	dev		duplicate	url reverse escape		Unreviewed	1	0	0	0	0	0
