Code


Version 11 (modified by jojo, 5 years ago) (diff)

--

Replacing get_absolute_url

Summary: get_absolute_url() is poorly defined and poorly named. It's too late to fix it for Django 1.0, but we should re-think it for Django 1.1 - Simon Willison

This page is a work in progress - I'm still figuring out the extent of the problem before I start working out a solution.

The problem

It's often useful for a model to "know" its URL. This is especially true for sites that follow RESTful principles, where any entity within the site should have one and only one canonical URL.

It's also useful to keep URL logic in the same place as much as possible. Django's {% url %} template tag and reverse() function solve a slightly different problem - they resolve URLs for view functions, not for individual model objects, and treat the URLconf as the single point of truth for URLs. {% url myapp.views.profile user.id %} isn't as pragmatic as {{ user.get_absolute_url }}, since if we change the profile-view to take a username instead of a user ID in the URL we'll have to go back and update all of our templates.

Being able to get the URL for a model is also useful outside of the template system. Django's admin, syndication and sitemaps modules all attempt to derive a URL for a model at various points, currently using the get_absolute_url method.

The current mechanism for making model's aware of their URL is the semi-standardised get_absolute_url method. If you provide this method on your model class, a number of different places in Django will use it to create URLs. You can also over-ride this using settings.ABSOLUTE_URL_OVERRIDES.

Unfortunately, get_absolute_url is mis-named. An "absolute" URL should be expected to include the protocol and domain, but in most cases get_absolute_url just returns the path. It was proposed to rename get_absolute_url to get_url_path, but this doesn't make sense either as some objects DO return a full URL from get_absolute_url (and in fact some places in Django check to see if the return value starts with http:// and behave differently as a result).

From this, we can derive that there are actually two important URL parts for a given model:

  1. The full URL, including protocol and domain. This is needed for the following cases:
    • links in e-mails, e.g. a "click here to activate your account" link
    • URLs included in syndication feeds
    • links used for things like "share this page on del.icio.us" widgets
    • links from the admin to "this object live on the site" where the admin is hosted on a separate domain or subdomain from the live site
  2. The path component of the URL. This is needed for internal links - it's a waste of bytes to jam the full URL in a regular link when a path could be used instead.

A third type of URL - URLs relative to the current page - is not being considered here because of the complexity involved in getting it right. That said, it would be possible to automatically derive a relative URL using the full path and a request-aware template tag.

So, for a given model we need a reliable way of determining its path on the site AND its full URL including domain. The path can be derived from the full URL, and sometimes vice versa depending on how the site's domain relates to the model objects in question.

Django currently uses django.contrib.sites in a number of places to attempt to derive a complete URL from just a path, but this has its own problems. The sites framework assumes the presence of a number of things: a django_site table, a SITE_ID in the settings and a record corresponding to that SITE_ID. This arrangement does not always make sense - consider the case of a site which provides a unique subdomain for every one of the site's users (simonwillison.myopenid.com for example). Additionally, making users add a record to the sites table when they start their project is Yet Another Step, and one that many people ignore. Finally, the site system doesn't really take development / staging / production environments in to account. Handling these properly requires additional custom code, which often ends up working around the sites system entirely.

Finally, it's important that places that use get_absolute_url (such as the admin, sitemaps, syndication etc) always provide an over-ridable alternative. Syndication feeds may wish to include extra hit-tracking material on URLs, admin sites may wish to link to staging or production depending on other criteria etc. At the moment some but not all of these tools provide over-riding mechanisms, but without any consistency as to what they are called or how they work.

It bears repeating that the problem of turning a path returned by get_absolute_url in to a full URL is a very real one: Django actually solves it in a number of places, each one taking a slightly different approach, none of which are really ideal. The fact that it's being solved multiple times and in multiple ways suggests a strong need for a single, reliable solution.

Current uses of get_absolute_url()

By grepping the Django source code, I've identified the following places where get_absolute_url is used:

grep -r get_absolute_url django | grep -v ".svn" | grep -v '.pyc'
  • contrib/admin/options.py: Uses hasattr(obj, 'get_absolute_url') to populate 'has_absolute_url' and 'show_url' properties which are passed through to templates and used to show links to that object on the actual site.
  • contrib/auth/models.py: Defines get_absolute_url on the User class to be /users/{{ username }}/ - this may be a bug since that URL is not defined by default anywhere in Django.
  • contrib/comments/models.py: Defines get_absolute_url on the Comment and FreeComment classes, to be the get_absolute_url of the comment's content object + '#c' + the comment's ID.
  • contrib/flatpages/models.py: Defined on FlatPage model, returns this.url (which is managed in the admin)
  • contrib/sitemaps/init.py: Sitemap.location(self, obj) uses obj.get_absolute_url() by default to figure out the URL to include in the sitemap - designed to be over-ridden
  • contrib/syndication/feeds.py: The default Feed.item_link(self, item) method (which is designed to be over-ridden) uses get_absolute_url, and raises an informative exception if it's not available. It also uses its own add_domain() function along with current_site.domain, which in turn uses Site.objects.get_current() and falls back on RequestSite(self.request) to figure out the full URL (both Site and RequestSite come from the django.contrib.sites package).
  • db/models/base.py: Takes get_absolute_url in to account when constructing the model class - this is where settings.ABSOLUTE_URL_OVERRIDES setting has its affect.
  • views/defaults.py: The thoroughly magic shorcut(request, content_type_id, object_id) view, which attempts to figure out a full URL to something based on a content_type and an object_id, makes extensive use of get_absolute_url - including behaving differently if the return value starts with http://.
  • views/generic/create_update.py: Both create and update views default to redirecting the user to get_absolute_url() if and only if post_save_redirect has not been configured for that view.

Finally, in the documentation:

  • docs/contributing.txt - mentioned in coding standards, model ordering section
  • docs/generic_views.txt
  • docs/model-api.txt - lots of places, including "It's good practice to use get_absolute_url() in templates..."
  • docs/settings.txt - in docs for ABSOLUTE_URL_OVERRIDES
  • docs/sitemaps.txt
  • docs/sites.txt - referred to as a "convention"
  • docs/syndication_feeds.txt
  • docs/templates.txt: - in an example
  • docs/unicode.txt - "Taking care in get_absolute_url..."
  • docs/url_dispatch.txt

And in the tests:

ABSOLUTE_URL_OVERRIDES is not tested.

get_absolute_url is referenced in:

  • tests/regressiontests/views/models.py
  • tests/regressiontests/views/tests/defaults.py
  • tests/regressiontests/views/tests/generic/create_update.py
  • tests/regressiontests/views/urls.py

The solution

I'm currently leaning towards two complementary methods:

  • get_url_path() - returns the URL's path component, starting at the root of the site - e.g. "/blog/2008/Aug/11/slug/"
  • get_url() - returns the full URL, including the protocol and domain - e.g. http://example.com/blog/2008/Aug/11/slug/"

Users should be able to define either or both of these methods. If they define one but not the other, the default implementation of the undefined method can attempt to figure it out based on the method that IS defined. This should actually work pretty well - get_url_path() is trival to derive from get_url(), whereas for sites that only exist on one domain get_url() could simply glue that domain (defined in settings.py, or derived from SITE_ID and the sites framework) on to get_url_path().

I don't think this needs to be all that complicated, and in fact the above scheme could allow us to delete a whole bunch of weird special case code scattered throughout Django.

Update 11th September 2008: Here's a prototype implementation (as a mixin class): http://code.google.com/p/django-urls/

The code for the prototype mixin is as follows:

from django.contrib.sites.models import Site
from django.conf import settings
import urlparse

class UrlMixin(object):
    
    def get_url(self):
        if hasattr(self.get_url_path, 'dont_recurse'):
            raise NotImplemented
        try:
            path = self.get_url_path()
        except NotImplemented:
            raise
        protocol = getattr(settings, "PROTOCOL", "http")
        domain = Site.objects.get_current().domain
        port = getattr(settings, "PORT", "")
        if port:
            assert port.startswith(":"), "The PORT setting must have a preceeding ':'."
        return "%s://%s%s%s" % (protocol, domain, port, path)
    get_url.dont_recurse = True
    
    def get_url_path(self):
        if hasattr(self.get_url, 'dont_recurse'):
            raise NotImplemented
        try:
            url = self.get_url()
        except NotImplemented:
            raise
        bits = urlparse.urlparse(url)
        return urlparse.urlunparse(('', '') + bits[2:])
    get_url_path.dont_recurse = True

And you use it like this:

from django.db import models
from django_urls.base import UrlMixin

class ArticleWithPathDefined(models.Model, UrlMixin):
    slug = models.SlugField()
    
    def get_url_path(self):
        return '/articles/%s/' % self.slug

class AssetWithUrlDefined(models.Model, UrlMixin):
    domain = models.CharField(max_length=30)
    filename = models.CharField(max_length = 30)
    
    def get_url(self):
        return 'http://%s/assets/%s' % (self.domain, self.filename)