Opened 2 years ago

Last modified 3 months ago

#20757 assigned New feature

A more Object-Oriented URLResolver

Reported by: fcurella Owned by: knbk
Component: Core (URLs) Version: master
Severity: Normal Keywords:
Cc: flavio.curella@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, the RegexURLResolver is populated using tuples.

Because of that, code that tries to inspect the resolver will end up containg awkward indexing like this (semplified code for clarity's sake):

def get_urlformat(urlname):
    """
    Given a URL name, returns the URL as a string suitable for string.format.

    Example::

    urlpatterns = patterns('',
        url(r'^extra/(?P<extra>\w+)/$', empty_view, name="named-url2"),
    )

    >>> get_urlformat('named-url2')
    'extra/%(extra)s/'
    """

    resolver = get_resolver()
    return resolver.reverse_dict[urlname][0][0][0]

My proposal is to replace tuples with a tuple-like object whose elements can be accessed by using attribute names. That way, the above method could become:

def get_urlformat(urlname):
    """
    Given a URL name, returns the URL as a string suitable for string.format.

    Example::

    urlpatterns = patterns('',
        url(r'^extra/(?P<extra>\w+)/$', empty_view, name="named-url2"),
    )

    >>> get_urlformat('named-url2')
    'extra/%(extra)s/'
    """

    resolver = get_resolver()
    urlbit = resolver.reverse_dict[urlname].urlbits[0]
    return urlbit.format

I realize this is mostly aesthetic, and there could be performance implications since the URLResolver is probably the most hit codepath, so I appreciate every kind of opinion.

The attached patch is still a draft, it definitely needs more extensive test coverage, but it's a start to get the idea.

Attachments (4)

patch_commit_44e7837a1706.patch (6.9 KB) - added by fcurella 2 years ago.
patch_commit_8a8468f33b16.patch (7.0 KB) - added by fcurella 2 years ago.
handle IndexError correctly
patch_commit_86a42e712a0e.patch (7.1 KB) - added by fcurella 2 years ago.
patch_commit_f8ce5686938b.patch (4.8 KB) - added by fcurella 2 years ago.
use namedtuples

Download all attachments as: .zip

Change History (12)

Changed 2 years ago by fcurella

comment:1 Changed 2 years ago by fcurella

  • Cc flavio.curella@… added
  • Component changed from Uncategorized to Core (URLs)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Type changed from Uncategorized to New feature

comment:2 Changed 2 years ago by fcurella

  • Needs tests set

comment:3 Changed 2 years ago by fcurella

  • Patch needs improvement set

Changed 2 years ago by fcurella

handle IndexError correctly

Changed 2 years ago by fcurella

comment:4 Changed 2 years ago by bmispelon

  • Triage Stage changed from Unreviewed to Accepted

Hi,

Python already has a namedtuple class in its collections module which is what you should be using for this case [1].
In theory, a namedtuple is a drop-in replacement for a regular tuple so backwards-compatibility should be assured.

However, I'm worried about the performance cost of such a replacement: as you noted, RegexURLResolver is a pretty critical component of django.

From a quick search, there's nothing in our documentation that explains how to extend the URL resolver (or why you'd want to).
I've also never done it myself either so I don't see exactly what we'd gain with this change.

So, to sum up, I'm -0 on the idea of using namedtuples, provided there's no major performance hit.

On the other hand, I'd definitely be +1 on documenting the process of extending the URL resolver and its machinery.

[1] http://docs.python.org/2/library/collections.html?highlight=collections.namedtuple#collections.namedtuple

Changed 2 years ago by fcurella

use namedtuples

comment:5 Changed 2 years ago by fcurella

I've updated the patch to use namedtuples and rebased against 7523e784633b7757fbc82df58f80b197eeed988a.

Re performance: according to the Python docs 'Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples.'.

I'd still like to run some profiling to have a better idea of the impact of this change. Memory consumption might be the same, but instantiation could have an overhead.

comment:6 Changed 14 months ago by UloPe

I've done a quick benchmark. The code and raw results can be found here: https://gist.github.com/ulope/a63aff42f51b08181f72#file-results-txt

Here is a table of the results (n=1000000):

Python operation datastructure time
CPy 2.6 instantiate tuple 0.243867874146
instantiate namedtuple 0.791953086853
access tuple 0.342396020889
access namedtuple 1.09635186195
CPy 2.7 instantiate tuple 0.300357103348
instantiate namedtuple 0.817892074585
access tuple 0.397746801376
access namedtuple 1.12735199928
CPy 3.3 instantiate tuple 0.19853064499329776
instantiate namedtuple 0.8839332649949938
access tuple 0.2663196460052859
access namedtuple 1.1236915799963754
CPy 3.4 instantiate tuple 0.2100249359937152
instantiate namedtuple 0.707535210007336
access tuple 0.2940493019996211
access namedtuple 0.9200455860118382
PyPy 2.3 instantiate tuple 0.00525689125061
instantiate namedtuple 0.00653004646301
access tuple 0.00449419021606
access namedtuple 0.00840997695923

From this it seems that namedtuple indeed is noticeably more expensive than a plain tuple (except, unsurprisingly, on PyPy). However as the absolute times for namedtuple are still quite short even at one million iterations I think the impact on Django's URL resolvers should be negligible.

comment:7 Changed 3 months ago by knbk

  • Owner changed from nobody to knbk
  • Status changed from new to assigned

comment:8 Changed 3 months ago by knbk

  • Has patch unset
  • Needs tests unset
  • Patch needs improvement unset
Note: See TracTickets for help on using tickets.
Back to Top