Django

Code

Ticket #2977 (new)

Opened 2 years ago

Last modified 2 months ago

Better handling of regular expressions for reverse urlresolver

Reported by: SmileyChris Assigned to: mtredinnick
Milestone: Component: Core framework
Version: Keywords: url reverse
Cc: albertpeschar+djangotrac@gmail.com, mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com, eric@akoha.org, beau@beaugunderson.com Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

I spent a bit of time looking over the urlresolvers.py today. I put together a patch fixing some problems with it.

Firstly, regular expressions should be set up outside the function so they only have to be compiled once (fix 1).

A TODO: was to make the resolver handle recursive parenthesis. After a bit of thinking (and actually making it work like that) I realised that is actually not what you'd need to do. Recursion is not necessary because the outermost matched parenthesis will be replaced with the given argument anyway.

What was needed was a rethink of choosing between named/unnamed groups. It should be named groups only (if any exist) for each whole regular expression (fix 2).

It does need to handle non-grouping parenthesis recursively though (fix 3).

Other expression extension notations (apart from named groups, of course) should be ignored (fix 4).

Probably rarely needed, but it now also handles pipes in regular expressions (fix 5).

Attachments

reverse_urlresolver.patch (5.3 kB) - added by SmileyChris on 11/05/06 02:03:15.
reverse_urlresolver.2.patch (5.3 kB) - added by SmileyChris on 11/05/06 02:34:19.
small fix
new_reverse_urlresolver.patch (10.9 kB) - added by SmileyChris on 11/08/06 17:46:03.
complete rewrite of most of the reverse code!
new_reverse_urlresolver.2.patch (11.1 kB) - added by SmileyChris on 06/18/07 05:06:04.
new_reverse_urlresolver.3.patch (12.5 kB) - added by SmileyChris on 06/24/07 22:50:45.
django-plus-sign-fix.patch (0.6 kB) - added by andresj on 10/23/07 23:50:07.
Fix for problem with plus signs (see #4883) and dots (tried to fix also dollar signs and '' but didn't work--don't know why). Generated with svn diff.

Change History

11/05/06 02:03:15 changed by SmileyChris

  • attachment reverse_urlresolver.patch added.

11/05/06 02:34:19 changed by SmileyChris

  • attachment reverse_urlresolver.2.patch added.

small fix

11/06/06 19:57:18 changed by jacob

This looks good, but I'm having trouble figuring out exactly how some of it works. Could you perhaps add some tests that illustrate how the current resolver fails? The right place for those is source:django/trunk/tests/urlpatterns_reverse/

11/06/06 19:57:55 changed by jacob

11/08/06 17:46:03 changed by SmileyChris

  • attachment new_reverse_urlresolver.patch added.

complete rewrite of most of the reverse code!

11/08/06 17:50:12 changed by SmileyChris

All right, my previous implementation had a few fatal flaws I realised. So I have rewritten everything.

A major change is that the RegexURLPattern objects get a compiled matcher with them, which will greatly speed up reverse lookups.

The rewrite was mostly necessary to correctly handle nested parenthesis, which it now does using a basic tokenizer. There is a small change in logic, which was a bug in the previous one: if a RE has both named and non-named groups, only the named groups should be parsed (this is in line with how the forward lookups work and what documentation says).

A few more tests have been added.

11/08/06 18:53:54 changed by SmileyChris

Also, the original code (Adrian's) handled argument passing incorrectly when used with a URLResolver. The resolver is should check both itself and its children using one set of arguments.

11/12/06 19:19:34 changed by django@poelzi.org

Hi, i just looked briefly on the patch. You should use a helper function like adrian did, currently the regexp will be parsed, even if it is never used.

11/13/06 15:38:24 changed by SmileyChris

Thanks for the advice.

The difference is that the parsed REs inside of RegexURLPatterns are cached, and I'm mostly sure that the RegexURLPatterns themselves are cached, so everyone wins.

01/18/07 01:06:59 changed by SmileyChris

  • stage changed from Unreviewed to Design decision needed.

03/11/07 14:16:13 changed by SmileyChris

  • needs_better_patch set to 1.

Should work with a mix of named and unnamed arguments

03/11/07 14:39:29 changed by SmileyChris

  • needs_better_patch deleted.

Actually, it shouldn't. The current behaviour of reverse is a bug since forward URL resolving doesn't accept both positional and named args (it drops the positional ones)

03/12/07 02:43:38 changed by mtredinnick

This ticket is a good example of why separate issues should be in separate bugs. You are trying to do many things at once and it makes it hard to split out the good from the not so good and discuss each issue in isolation. That being said, let's work with what you've done (since it's an impressive effort):

Fixes 1, 3 and 5 are all fine. They're just small cleanups. That can be a single patch.

Fix 2 is a design decision and I think you're making the wrong decision. Allowing keyword and positional args to mix (not nest, but be in the same reg-exp) is fine.

Fix 4, I need to think about some more.

I'll commit portions and make a decision on the rest in the next couple of days. No need to write a new patch, I can see what you're doing in the current code and it will be easy to tease it apart.

03/12/07 03:21:21 changed by SmileyChris

Yea, I should have written separate patches. Next time ;)

Re fix 2, I'm probably blowing hot air, you already said it's going stay as-is in the other ticket but I'm still not sure why you think mixing keyword and positional arguments is useful if it doesn't work for a forward resolution.

In the form of a question: Why should a reverse match a RegexURLPattern which won't call its view the same arguments which reverse received?

Fix 4 is because I didn't want to tackle the difficult types of extension notation, notably lookahead/behind assertions and named backreferences. Note, the patch also handles non-grouping parenthesis.

04/18/07 17:01:21 changed by SmileyChris

See #4075 for a problem which this ticket would fix

06/01/07 16:40:39 changed by anonymous

  • cc set to dcwatson@gmail.com.

06/16/07 22:27:52 changed by Todd O'Bryan <toddobryan@mac.com>

I'm not sure that you're sufficiently sneaky when removing special characters from URLs...

For example, you're using the regexes {{{re_unused = re.compile(r'(?<!\\)[$?*+()]') re_special = re.compile(r'\\(.+*()$)') # Characters from the IETF URL standard, RFC 1738. }}}

I'm worried that re_unused won't match the $ in r'\\$' even though it should. Also, re_special will match \. in r'\\.' even though it shouldn't.

Also, you don't translate unsafe characters from regexes back to normal strings. That might be the correct behavior--I'm not sure what Django/Apache does with a URL like '/words%20with%20space'. If %20 becomes a space before it enters the url lookup system, then urls with unsafe characters are possible and we should be able to reverse them (making sure that they get encoded before they're actually included in a webpage). If it's kept in its encoded form, then we shouldn't have to worry about regexes with unsafe characters.

I just created ticket #4594 with my patch. Let's figure out how to deal with these corner cases and maybe we can pull out the part of your code that deals with this so Malcolm will finally get around to applying the various parts. :-)

06/18/07 05:06:04 changed by SmileyChris

  • attachment new_reverse_urlresolver.2.patch added.

06/18/07 05:09:16 changed by SmileyChris

I caved in and undid fix 2 (but made it so that if there is a named group inside of a non-named group then it'll ignore the non-named group).

The slashes should be much more solid.

Unsafe characters from regex are unencoded nicely.

06/24/07 22:50:45 changed by SmileyChris

  • attachment new_reverse_urlresolver.3.patch added.

06/24/07 23:03:16 changed by SmileyChris

Unsafe characters from regex are now only unencoded from sections not being replaced by arguments

I handled the somewhat rare case of non-argument sections containing character sets ([aeiou]). It just chooses the first character in there. Perhaps overkill, but I do know that some people prefer to write their special characters this way rather than escaping them (eg [$] rather than \$).

06/24/07 23:08:07 changed by mtredinnick

  • owner changed from adrian to mtredinnick.
  • summary changed from [patch] Better handling of regular expressions for reverse urlresolver to Better handling of regular expressions for reverse urlresolver.
  • stage changed from Design decision needed to Accepted.

Okay, this is on my review list now. Maybe avoid making nay major changes for a couple of days so that I have a chance to read through it without changes happening underneath. Will let you know if I find anything disturbing.

07/16/07 02:09:13 changed by Alexander Solovyov <alexander.solovyov@gmail.com>

  • cc changed from dcwatson@gmail.com to dcwatson@gmail.com, alexander.solovyov@gmail.com.

08/17/07 09:03:17 changed by anonymous

  • cc changed from dcwatson@gmail.com, alexander.solovyov@gmail.com to dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com.

09/07/07 17:36:07 changed by anonymous

  • cc changed from dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com to dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org.

09/08/07 10:15:44 changed by James Wheare <django@sparemint.com>

  • cc changed from dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org to dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com.

09/25/07 00:27:59 changed by Matthew Flanagan <mattimustang@gmail.com>

  • cc changed from dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com to mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com.
  • needs_better_patch set to 1.

this patch does not work for me. Even after manually applying it to latest trunk it fails with the following complex regexp:

IP4_RE = r'(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
IP6_HEX_RE = r'(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}'
IP6_HEX_COMPRESSED_RE = r'((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)'
IP6_IP4_COMPAT_RE = r'((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
IP6_IP4_COMPAT_COMPRESSED_RE = r'((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}'
IP6_RE = '%s|%s|%s|%s' % (IP6_HEX_RE, IP6_HEX_COMPRESSED_RE,
    IP6_IP4_COMPAT_RE, IP6_IP4_COMPAT_COMPRESSED_RE)
IP6_RE_COMPILED = re.compile('^(%s|%s|%s|%s)$' % (IP6_HEX_RE,
    IP6_HEX_COMPRESSED_RE, IP6_IP4_COMPAT_RE, IP6_IP4_COMPAT_COMPRESSED_RE))
IP_RE = "(%s|%s)" % (IP4_RE, IP6_RE)


urlpatterns = patterns('',
    url(r'^(?P<ipsupernetwork>%s)/$' % IP_RE, ipnetwork_list, IPNETWORK_LIST,
        name='ipnetwork_list'),

)

Running django.core.urlresolvers.reverse results in:

In [6]: reverse('ipnetwork_list', urlconf='nong.ip.urls',args=(), kwargs={'ipsupernetwork': '10.0.0.0'})
---------------------------------------------------------------------------
sre_constants.error                                     Traceback (most recent call last)

/srv/dev/mpf/nong/third-party/django/tests/<ipython console>

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in reverse(viewname, urlconf, args, kwargs)
    399 def reverse(viewname, urlconf=None, args=None, kwargs=None):
    400     args = args or []
    401     kwargs = kwargs or {}
--> 402     return iri_to_uri(u'/' + get_resolver(urlconf).reverse(viewname, *args, **kwargs))
    403

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in reverse(self, lookup_view, *args, **kwargs)
    383         except (ImportError, AttributeError):
    384             raise NoReverseMatch
--> 385         if lookup_view in self.reverse_dict:
    386             return u''.join([reverse_helper(part.regex, *args, **kwargs) for part in self.reverse_dict[lookup_view]])
    387         raise NoReverseMatch

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in _get_reverse_dict(self)
    317
    318     def _get_reverse_dict(self):
--> 319         if not self._reverse_dict and hasattr(self.urlconf_module, 'urlpatterns'):
    320             for pattern in reversed(self.urlconf_module.urlpatterns):
    321                 if isinstance(pattern, RegexURLResolver):

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in _get_urlconf_module(self)
    353         except AttributeError:
    354             try:
--> 355                 self._urlconf_module = __import__(self.urlconf_name, {}, {}, [''])
    356             except ValueError, e:
    357                 # Invalid urlconf_name, such as "foo.bar." (note trailing period)

/srv/dev/mpf/nong/third-party/django/tests/nong/ip/urls.py
     13     (r'^url_tag/', include('regressiontests.templates.urls')),
     14
     15     # django built-in views
     16     (r'^views/', include('regressiontests.views.urls')),
     17 )

/srv/dev/mpf/nong/third-party/django/django/conf/urls/defaults.py in url(regex, view, kwargs, name, prefix)
     28                 raise ImproperlyConfigured('Empty URL pattern view name not permitted (for pattern %r)' % regex)
     29             if prefix:
     30                 view = prefix + '.' + view
---> 31         return RegexURLPattern(regex, view, kwargs, name)
     32

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in __init__(self, regex, callback, default_args, name)
    239         # callable object (view).
    240         self.regex = re.compile(regex, re.UNICODE)
--> 241         self.reverse_regex_lookup = ReverseRegexLookup(regex)
    242         if callable(callback):
    243             self._callback = callback

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in __init__(self, text)
    194 class ReverseRegexLookup(object):
    195     def __init__(self, text):
--> 196         self.tokens, self.minimum_arguments, self.error = tokenize(text)
    197
    198     def check(self, args=[], kwargs={}):

/srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in tokenize(text)
    179         if isinstance(bit, list):
    180             # Build the regex here so it only has to be compiled once.
--> 181             bit = re.compile('%s$' % build_re(bit))
    182             count += 1
    183         else:

/usr/local/lib/python2.4/sre.py in compile(pattern, flags)
    178 def compile(pattern, flags=0):
    179     "Compile a regular expression pattern, returning a pattern object."
--> 180     return _compile(pattern, flags)
    181
    182 def purge():

/usr/local/lib/python2.4/sre.py in _compile(*key)
    225         p = sre_compile.compile(pattern, flags)
    226     except error, v:
--> 227         raise error, v # invalid expression
    228     if len(_cache) >= _MAXCACHE:
    229         _cache.clear()

error: multiple repeat

10/23/07 23:50:07 changed by andresj

  • attachment django-plus-sign-fix.patch added.

Fix for problem with plus signs (see #4883) and dots (tried to fix also dollar signs and '' but didn't work--don't know why). Generated with svn diff.

10/28/07 02:57:57 changed by mtredinnick

  • owner changed from nobody to mtredinnick.

12/10/07 02:04:34 changed by ubernostrum

  • keywords set to url reverse.

03/06/08 15:30:37 changed by anonymous

  • cc changed from mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com to albertpeschar+djangotrac@gmail.com, mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com.

04/02/08 04:52:18 changed by sebastian_noack

I started to solve this problems, too. See #6934. But this patch seems to be more complete. I would like to see it merged into SVN, but unfortunately I can not apply even the patch against the current HEAD.

04/02/08 10:06:04 changed by wiswaud

  • cc changed from albertpeschar+djangotrac@gmail.com, mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com to albertpeschar+djangotrac@gmail.com, mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com, eric@akoha.org.

04/26/08 18:41:41 changed by Beau Gunderson <beau@beaugunderson.com>

  • cc changed from albertpeschar+djangotrac@gmail.com, mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com, eric@akoha.org to albertpeschar+djangotrac@gmail.com, mattimustang@gmail.com, dcwatson@gmail.com, alexander.solovyov@gmail.com, daybreaker12@gmail.com, wbyoung@fadingred.org, django@sparemint.com, eric@akoha.org, beau@beaugunderson.com.

I hit this the other day too; looking forward to this being applied! :)


Add/Change #2977 (Better handling of regular expressions for reverse urlresolver)




Change Properties
Action