id,summary,reporter,owner,description,type,status,component,version,severity,resolution,keywords,cc,stage,has_patch,needs_docs,needs_tests,needs_better_patch,easy,ui_ux 4594,django.core.urlresolvers.reverse_helper doesn't unescape characters that are escaped in URL regexes,Todd O'Bryan ,Jacob,"Any backslash-escaped character in a URL doesn't get unescaped by the `{% url ...%}` tag (and presumably by other methods of view reversing). For example, {{{ urlpatterns = patterns('', (r'^prices/less_than_\$(?P\d+)/$', 'cost_less'), (r'^headlines/(?P/d+)\.(?P\d+)\.(?P\d+)/$', 'daily_headlines'), (r'^priests/(?P\w+)\+/$', 'priest_homepage'), (r'^windows_path/(?P[A-Z]):\\\\(?P.+)', 'windows_path'), ) }}} The dollar sign, dot, plus, and backslash in each of the URL patterns match a single character, but don't get converted back to that character by the reverse function. It seems that there aren't that many of these. Any escape sequence that doesn't match a constant string (i.e. something like `\s` or `\d` or `\w`) had better be part of a pattern so that it can be replaced with the right string to get the URL you're expecting. That leaves the following, I think. || Pattern || Replacement || || `\A` || `''` (equivalent to `^`)|| || `\Z` || `''` (equivalent to `$`)|| || `\b` and `\B` || `''` (these ''shouldn't'' appear in urls, but can only match the empty string)|| || `\.`, `\^`, `\$`, `\*`, `\+`, `\?`, `\(`, `\)`, `\{`, `\}`, `\[`, `\]`, and `\\` || the same character, without a backslash|| As a first stab, I'd just get rid of `\A`, `\Z`, `\b`, and `\B`, just as the current code does for `^` and `$`. This is actually kind of complicated, because you have to make sure that the `\` in front isn't part of a pair of backslashes. In other words, `\\b` should become `\b`, but `\\\b` should just become `\`. Also, the current code removes all `^` and `$`. That's wrong if they're preceded by a backslash and meant to be the actual character. There are some gotchas--when you insert values, you have to escape characters that you'll be unescaping later. I do check for character classes that don't map to a single definite character (e.g., `\d` and `\w`) and raise an exception if they're still there when we finish (since the reverse lookup can't work). I don't check for things like `[a-z]` or `a{2,3}`, but that will almost guarantee the reversing fails, too. Note that #2977 also addresses this problem, but it does other things, too. Also I think that code may not handle some corner cases correctly. Meanwhile, my patch may be overly agressive and might include handling for characters that will never appear in a URL. Give SmileyChris and me some time to work this out.",,closed,Uncategorized,dev,,duplicate,url reverse escape,,Unreviewed,1,0,0,0,0,0