Code

Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#2977 closed enhancement (fixed)

Better handling of regular expressions for reverse urlresolver

Reported by: SmileyChris Owned by: mtredinnick
Component: Core (Other) Version:
Severity: normal Keywords: url reverse
Cc: albertpeschar+djangotrac@…, mattimustang@…, dcwatson@…, alexander.solovyov@…, daybreaker12@…, wbyoung@…, django@…, eric@…, beau@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

I spent a bit of time looking over the urlresolvers.py today. I put together a patch fixing some problems with it.

Firstly, regular expressions should be set up outside the function so they only have to be compiled once (fix 1).

A TODO: was to make the resolver handle recursive parenthesis. After a bit of thinking (and actually making it work like that) I realised that is actually not what you'd need to do. Recursion is not necessary because the outermost matched parenthesis will be replaced with the given argument anyway.

What was needed was a rethink of choosing between named/unnamed groups. It should be named groups only (if any exist) for each whole regular expression (fix 2).

It does need to handle non-grouping parenthesis recursively though (fix 3).

Other expression extension notations (apart from named groups, of course) should be ignored (fix 4).

Probably rarely needed, but it now also handles pipes in regular expressions (fix 5).

Attachments (6)

reverse_urlresolver.patch (5.3 KB) - added by SmileyChris 7 years ago.
reverse_urlresolver.2.patch (5.3 KB) - added by SmileyChris 7 years ago.
small fix
new_reverse_urlresolver.patch (10.9 KB) - added by SmileyChris 7 years ago.
complete rewrite of most of the reverse code!
new_reverse_urlresolver.2.patch (11.1 KB) - added by SmileyChris 7 years ago.
new_reverse_urlresolver.3.patch (12.5 KB) - added by SmileyChris 7 years ago.
django-plus-sign-fix.patch (642 bytes) - added by andresj 6 years ago.
Fix for problem with plus signs (see #4883) and dots (tried to fix also dollar signs and '' but didn't work--don't know why). Generated with svn diff.

Download all attachments as: .zip

Change History (36)

Changed 7 years ago by SmileyChris

Changed 7 years ago by SmileyChris

small fix

comment:1 Changed 7 years ago by jacob

This looks good, but I'm having trouble figuring out exactly how some of it works. Could you perhaps add some tests that illustrate how the current resolver fails? The right place for those is source:django/trunk/tests/urlpatterns_reverse/

comment:2 Changed 7 years ago by jacob

Er, that should be: source:django/trunk/tests/regressiontests/urlpatterns_reverse

Changed 7 years ago by SmileyChris

complete rewrite of most of the reverse code!

comment:3 Changed 7 years ago by SmileyChris

All right, my previous implementation had a few fatal flaws I realised. So I have rewritten everything.

A major change is that the RegexURLPattern objects get a compiled matcher with them, which will greatly speed up reverse lookups.

The rewrite was mostly necessary to correctly handle nested parenthesis, which it now does using a basic tokenizer. There is a small change in logic, which was a bug in the previous one: if a RE has both named and non-named groups, only the named groups should be parsed (this is in line with how the forward lookups work and what documentation says).

A few more tests have been added.

comment:4 Changed 7 years ago by SmileyChris

Also, the original code (Adrian's) handled argument passing incorrectly when used with a URLResolver. The resolver is should check both itself and its children using one set of arguments.

comment:5 Changed 7 years ago by django@…

Hi, i just looked briefly on the patch. You should use a helper function like adrian did, currently the regexp will be parsed, even if it is never used.

comment:6 Changed 7 years ago by SmileyChris

Thanks for the advice.

The difference is that the parsed REs inside of RegexURLPatterns are cached, and I'm mostly sure that the RegexURLPatterns themselves are cached, so everyone wins.

comment:7 Changed 7 years ago by SmileyChris

  • Triage Stage changed from Unreviewed to Design decision needed

comment:8 Changed 7 years ago by SmileyChris

  • Patch needs improvement set

Should work with a mix of named and unnamed arguments

comment:9 Changed 7 years ago by SmileyChris

  • Patch needs improvement unset

Actually, it shouldn't. The current behaviour of reverse is a bug since forward URL resolving doesn't accept both positional and named args (it drops the positional ones)

comment:10 Changed 7 years ago by mtredinnick

This ticket is a good example of why separate issues should be in separate bugs. You are trying to do many things at once and it makes it hard to split out the good from the not so good and discuss each issue in isolation. That being said, let's work with what you've done (since it's an impressive effort):

Fixes 1, 3 and 5 are all fine. They're just small cleanups. That can be a single patch.

Fix 2 is a design decision and I think you're making the wrong decision. Allowing keyword and positional args to mix (not nest, but be in the same reg-exp) is fine.

Fix 4, I need to think about some more.

I'll commit portions and make a decision on the rest in the next couple of days. No need to write a new patch, I can see what you're doing in the current code and it will be easy to tease it apart.

comment:11 Changed 7 years ago by SmileyChris

Yea, I should have written separate patches. Next time ;)

Re fix 2, I'm probably blowing hot air, you already said it's going stay as-is in the other ticket but I'm still not sure why you think mixing keyword and positional arguments is useful if it doesn't work for a forward resolution.

In the form of a question: Why should a reverse match a RegexURLPattern which won't call its view the same arguments which reverse received?

Fix 4 is because I didn't want to tackle the difficult types of extension notation, notably lookahead/behind assertions and named backreferences. Note, the patch also handles non-grouping parenthesis.

comment:12 Changed 7 years ago by SmileyChris

See #4075 for a problem which this ticket would fix

comment:13 Changed 7 years ago by anonymous

  • Cc dcwatson@… added

comment:14 Changed 7 years ago by Todd O'Bryan <toddobryan@…>

I'm not sure that you're sufficiently sneaky when removing special characters from URLs...

For example, you're using the regexes
{{{re_unused = re.compile(r'(?<\\)[$?*+()]')
re_special = re.compile(r'
([.+*()$])') # Characters from the IETF URL standard, RFC 1738.
}}}

I'm worried that re_unused won't match the $ in r'\\$' even though it should. Also, re_special will match \. in r'\\.' even though it shouldn't.

Also, you don't translate unsafe characters from regexes back to normal strings. That might be the correct behavior--I'm not sure what Django/Apache does with a URL like '/words%20with%20space'. If %20 becomes a space before it enters the url lookup system, then urls with unsafe characters are possible and we should be able to reverse them (making sure that they get encoded before they're actually included in a webpage). If it's kept in its encoded form, then we shouldn't have to worry about regexes with unsafe characters.

I just created ticket #4594 with my patch. Let's figure out how to deal with these corner cases and maybe we can pull out the part of your code that deals with this so Malcolm will finally get around to applying the various parts. :-)

Changed 7 years ago by SmileyChris

comment:15 Changed 7 years ago by SmileyChris

I caved in and undid fix 2 (but made it so that if there is a named group inside of a non-named group then it'll ignore the non-named group).

The slashes should be much more solid.

Unsafe characters from regex are unencoded nicely.

Changed 7 years ago by SmileyChris

comment:16 Changed 7 years ago by SmileyChris

Unsafe characters from regex are now only unencoded from sections not being replaced by arguments

I handled the somewhat rare case of non-argument sections containing character sets ([aeiou]). It just chooses the first character in there. Perhaps overkill, but I do know that some people prefer to write their special characters this way rather than escaping them (eg [$] rather than \$).

comment:17 Changed 7 years ago by mtredinnick

  • Owner changed from adrian to mtredinnick
  • Summary changed from [patch] Better handling of regular expressions for reverse urlresolver to Better handling of regular expressions for reverse urlresolver
  • Triage Stage changed from Design decision needed to Accepted

Okay, this is on my review list now. Maybe avoid making nay major changes for a couple of days so that I have a chance to read through it without changes happening underneath. Will let you know if I find anything disturbing.

comment:18 Changed 7 years ago by Alexander Solovyov <alexander.solovyov@…>

  • Cc alexander.solovyov@… added

comment:19 Changed 7 years ago by anonymous

  • Cc daybreaker12@… added

comment:20 Changed 7 years ago by anonymous

  • Cc wbyoung@… added

comment:21 Changed 7 years ago by James Wheare <django@…>

  • Cc django@… added

comment:22 Changed 7 years ago by Matthew Flanagan <mattimustang@…>

  • Cc mattimustang@… added
  • Patch needs improvement set

this patch does not work for me. Even after manually applying it to latest trunk it fails with the following complex regexp:

IP4_RE = r'(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
IP6_HEX_RE = r'(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}'
IP6_HEX_COMPRESSED_RE = r'((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)'
IP6_IP4_COMPAT_RE = r'((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
IP6_IP4_COMPAT_COMPRESSED_RE = r'((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
IP6_RE = '%s|%s|%s|%s' % (IP6_HEX_RE, IP6_HEX_COMPRESSED_RE,
    IP6_IP4_COMPAT_RE, IP6_IP4_COMPAT_COMPRESSED_RE)
IP6_RE_COMPILED = re.compile('^(%s|%s|%s|%s)$' % (IP6_HEX_RE,
    IP6_HEX_COMPRESSED_RE, IP6_IP4_COMPAT_RE, IP6_IP4_COMPAT_COMPRESSED_RE))
IP_RE = "(%s|%s)" % (IP4_RE, IP6_RE)


urlpatterns = patterns('',
    url(r'^(?P<ipsupernetwork>%s)/$' % IP_RE, ipnetwork_list, IPNETWORK_LIST,
        name='ipnetwork_list'),

)

Running django.core.urlresolvers.reverse results in:

In [6]: reverse('ipnetwork_list', urlconf='nong.ip.urls',args=(), kwargs={'ipsupernetwork': '10.0.0.0'})
---------------------------------------------------------------------------
sre_constants.error                                     Traceback (most recent call last)

/srv/dev/mpf/nong/third-party/django/tests/<ipython console>

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in reverse(viewname, urlconf, args, kwargs)
    399 def reverse(viewname, urlconf=None, args=None, kwargs=None):
    400     args = args or []
    401     kwargs = kwargs or {}
--> 402     return iri_to_uri(u'/' + get_resolver(urlconf).reverse(viewname, *args, **kwargs))
    403

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in reverse(self, lookup_view, *args, **kwargs)
    383         except (ImportError, AttributeError):
    384             raise NoReverseMatch
--> 385         if lookup_view in self.reverse_dict:
    386             return u''.join([reverse_helper(part.regex, *args, **kwargs) for part in self.reverse_dict[lookup_view]])
    387         raise NoReverseMatch

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in _get_reverse_dict(self)
    317
    318     def _get_reverse_dict(self):
--> 319         if not self._reverse_dict and hasattr(self.urlconf_module, 'urlpatterns'):
    320             for pattern in reversed(self.urlconf_module.urlpatterns):
    321                 if isinstance(pattern, RegexURLResolver):

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in _get_urlconf_module(self)
    353         except AttributeError:
    354             try:
--> 355                 self._urlconf_module = __import__(self.urlconf_name, {}, {}, [''])
    356             except ValueError, e:
    357                 # Invalid urlconf_name, such as "foo.bar." (note trailing period)

/srv/dev/mpf/nong/third-party/django/tests/nong/ip/urls.py
     13     (r'^url_tag/', include('regressiontests.templates.urls')),
     14
     15     # django built-in views
     16     (r'^views/', include('regressiontests.views.urls')),
     17 )

/srv/dev/mpf/nong/third-party/django/django/conf/urls/defaults.py in url(regex, view, kwargs, name, prefix)
     28                 raise ImproperlyConfigured('Empty URL pattern view name not permitted (for pattern %r)' % regex)
     29             if prefix:
     30                 view = prefix + '.' + view
---> 31         return RegexURLPattern(regex, view, kwargs, name)
     32

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in __init__(self, regex, callback, default_args, name)
    239         # callable object (view).
    240         self.regex = re.compile(regex, re.UNICODE)
--> 241         self.reverse_regex_lookup = ReverseRegexLookup(regex)
    242         if callable(callback):
    243             self._callback = callback

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in __init__(self, text)
    194 class ReverseRegexLookup(object):
    195     def __init__(self, text):
--> 196         self.tokens, self.minimum_arguments, self.error = tokenize(text)
    197
    198     def check(self, args=[], kwargs={}):

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in tokenize(text)
    179         if isinstance(bit, list):
    180             # Build the regex here so it only has to be compiled once.
--> 181             bit = re.compile('%s$' % build_re(bit))
    182             count += 1
    183         else:

/usr/local/lib/python2.4/sre.py in compile(pattern, flags)
    178 def compile(pattern, flags=0):
    179     "Compile a regular expression pattern, returning a pattern object."
--> 180     return _compile(pattern, flags)
    181
    182 def purge():

/usr/local/lib/python2.4/sre.py in _compile(*key)
    225         p = sre_compile.compile(pattern, flags)
    226     except error, v:
--> 227         raise error, v # invalid expression
    228     if len(_cache) >= _MAXCACHE:
    229         _cache.clear()

error: multiple repeat

Changed 6 years ago by andresj

Fix for problem with plus signs (see #4883) and dots (tried to fix also dollar signs and '' but didn't work--don't know why). Generated with svn diff.

comment:23 Changed 6 years ago by mtredinnick

  • Owner changed from nobody to mtredinnick

comment:24 Changed 6 years ago by ubernostrum

  • Keywords url reverse added

comment:25 Changed 6 years ago by anonymous

  • Cc albertpeschar+djangotrac@… added

comment:26 Changed 6 years ago by sebastian_noack

I started to solve this problems, too. See #6934. But this patch seems to be more complete. I would like to see it merged into SVN, but unfortunately I can not apply even the patch against the current HEAD.

comment:27 Changed 6 years ago by wiswaud

  • Cc eric@… added

comment:28 Changed 6 years ago by Beau Gunderson <beau@…>

  • Cc beau@… added

I hit this the other day too; looking forward to this being applied! :)

comment:29 Changed 6 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [8760]) A rewrite of the reverse URL parsing: the reverse() call and the "url" template tag.

This is fully backwards compatible, but it fixes a bunch of little bugs. Thanks
to SmileyChris and Ilya Semenov for some early patches in this area that were
incorporated into this change.

Fixed #2977, #4915, #6934, #7206.

comment:30 Changed 6 years ago by mtredinnick

I intentionally didn't take the pipe-handling portion of this patch. That's actually a lot trickier than it looks when the alternatives contain different parameter values. The infrastructure is more or less in place to handle that, but I was having a lot of trouble making it stable, so somebody can address that after 1.0.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.