| 1 |
===================== |
|---|
| 2 |
The sitemap framework |
|---|
| 3 |
===================== |
|---|
| 4 |
|
|---|
| 5 |
**New in Django development version**. |
|---|
| 6 |
|
|---|
| 7 |
Django comes with a high-level sitemap-generating framework that makes |
|---|
| 8 |
creating sitemap_ XML files easy. |
|---|
| 9 |
|
|---|
| 10 |
.. _sitemap: http://www.sitemaps.org/ |
|---|
| 11 |
|
|---|
| 12 |
Overview |
|---|
| 13 |
======== |
|---|
| 14 |
|
|---|
| 15 |
A sitemap is an XML file on your Web site that tells search-engine indexers how |
|---|
| 16 |
frequently your pages change and how "important" certain pages are in relation |
|---|
| 17 |
to other pages on your site. This information helps search engines index your |
|---|
| 18 |
site. |
|---|
| 19 |
|
|---|
| 20 |
The Django sitemap framework automates the creation of this XML file by letting |
|---|
| 21 |
you express this information in Python code. |
|---|
| 22 |
|
|---|
| 23 |
It works much like Django's `syndication framework`_. To create a sitemap, just |
|---|
| 24 |
write a ``Sitemap`` class and point to it in your URLconf_. |
|---|
| 25 |
|
|---|
| 26 |
.. _syndication framework: ../syndication/ |
|---|
| 27 |
.. _URLconf: ../url_dispatch/ |
|---|
| 28 |
|
|---|
| 29 |
Installation |
|---|
| 30 |
============ |
|---|
| 31 |
|
|---|
| 32 |
To install the sitemap app, follow these steps: |
|---|
| 33 |
|
|---|
| 34 |
1. Add ``'django.contrib.sitemaps'`` to your INSTALLED_APPS_ setting. |
|---|
| 35 |
2. Make sure ``'django.template.loaders.app_directories.load_template_source'`` |
|---|
| 36 |
is in your TEMPLATE_LOADERS_ setting. It's in there by default, so |
|---|
| 37 |
you'll only need to change this if you've changed that setting. |
|---|
| 38 |
3. Make sure you've installed the `sites framework`_. |
|---|
| 39 |
|
|---|
| 40 |
(Note: The sitemap application doesn't install any database tables. The only |
|---|
| 41 |
reason it needs to go into ``INSTALLED_APPS`` is so that the |
|---|
| 42 |
``load_template_source`` template loader can find the default templates.) |
|---|
| 43 |
|
|---|
| 44 |
.. _INSTALLED_APPS: ../settings/#installed-apps |
|---|
| 45 |
.. _TEMPLATE_LOADERS: ../settings/#template-loaders |
|---|
| 46 |
.. _sites framework: ../sites/ |
|---|
| 47 |
|
|---|
| 48 |
Initialization |
|---|
| 49 |
============== |
|---|
| 50 |
|
|---|
| 51 |
To activate sitemap generation on your Django site, add this line to your |
|---|
| 52 |
URLconf_: |
|---|
| 53 |
|
|---|
| 54 |
(r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) |
|---|
| 55 |
|
|---|
| 56 |
This tells Django to build a sitemap when a client accesses ``/sitemap.xml``. |
|---|
| 57 |
|
|---|
| 58 |
The name of the sitemap file is not important, but the location is. Search |
|---|
| 59 |
engines will only index links in your sitemap for the current URL level and |
|---|
| 60 |
below. For instance, if ``sitemap.xml`` lives in your root directory, it may |
|---|
| 61 |
reference any URL in your site. However, if your sitemap lives at |
|---|
| 62 |
``/content/sitemap.xml``, it may only reference URLs that begin with |
|---|
| 63 |
``/content/``. |
|---|
| 64 |
|
|---|
| 65 |
The sitemap view takes an extra, required argument: ``{'sitemaps': sitemaps}``. |
|---|
| 66 |
``sitemaps`` should be a dictionary that maps a short section label (e.g., |
|---|
| 67 |
``blog`` or ``news``) to its ``Sitemap`` class (e.g., ``BlogSitemap`` or |
|---|
| 68 |
``NewsSitemap``). It may also map to an *instance* of a ``Sitemap`` class |
|---|
| 69 |
(e.g., ``BlogSitemap(some_var)``). |
|---|
| 70 |
|
|---|
| 71 |
.. _URLconf: ../url_dispatch/ |
|---|
| 72 |
|
|---|
| 73 |
Sitemap classes |
|---|
| 74 |
=============== |
|---|
| 75 |
|
|---|
| 76 |
A ``Sitemap`` class is a simple Python class that represents a "section" of |
|---|
| 77 |
entries in your sitemap. For example, one ``Sitemap`` class could represent all |
|---|
| 78 |
the entries of your weblog, while another could represent all of the events in |
|---|
| 79 |
your events calendar. |
|---|
| 80 |
|
|---|
| 81 |
In the simplest case, all these sections get lumped together into one |
|---|
| 82 |
``sitemap.xml``, but it's also possible to use the framework to generate a |
|---|
| 83 |
sitemap index that references individual sitemap files, one per section. (See |
|---|
| 84 |
`Creating a sitemap index`_ below.) |
|---|
| 85 |
|
|---|
| 86 |
``Sitemap`` classes must subclass ``django.contrib.sitemaps.Sitemap``. They can |
|---|
| 87 |
live anywhere in your codebase. |
|---|
| 88 |
|
|---|
| 89 |
A simple example |
|---|
| 90 |
================ |
|---|
| 91 |
|
|---|
| 92 |
Let's assume you have a blog system, with an ``Entry`` model, and you want your |
|---|
| 93 |
sitemap to include all the links to your individual blog entries. Here's how |
|---|
| 94 |
your sitemap class might look:: |
|---|
| 95 |
|
|---|
| 96 |
from django.contrib.sitemaps import Sitemap |
|---|
| 97 |
from mysite.blog.models import Entry |
|---|
| 98 |
|
|---|
| 99 |
class BlogSitemap(Sitemap): |
|---|
| 100 |
changefreq = "never" |
|---|
| 101 |
priority = 0.5 |
|---|
| 102 |
|
|---|
| 103 |
def items(self): |
|---|
| 104 |
return Entry.objects.filter(is_draft=False) |
|---|
| 105 |
|
|---|
| 106 |
def lastmod(self, obj): |
|---|
| 107 |
return obj.pub_date |
|---|
| 108 |
|
|---|
| 109 |
Note: |
|---|
| 110 |
|
|---|
| 111 |
* ``changefreq`` and ``priority`` are class attributes corresponding to |
|---|
| 112 |
``<changefreq>`` and ``<priority>`` elements, respectively. They can be |
|---|
| 113 |
made callable as functions, as ``lastmod`` was in the example. |
|---|
| 114 |
* ``items()`` is simply a method that returns a list of objects. The objects |
|---|
| 115 |
returned will get passed to any callable methods corresponding to a |
|---|
| 116 |
sitemap property (``location``, ``lastmod``, ``changefreq``, and |
|---|
| 117 |
``priority``). |
|---|
| 118 |
* ``lastmod`` should return a Python ``datetime`` object. |
|---|
| 119 |
* There is no ``location`` method in this example, but you can provide it |
|---|
| 120 |
in order to specify the URL for your object. By default, ``location()`` |
|---|
| 121 |
calls ``get_absolute_url()`` on each object and returns the result. |
|---|
| 122 |
|
|---|
| 123 |
Sitemap class reference |
|---|
| 124 |
======================= |
|---|
| 125 |
|
|---|
| 126 |
A ``Sitemap`` class can define the following methods/attributes: |
|---|
| 127 |
|
|---|
| 128 |
``items`` |
|---|
| 129 |
--------- |
|---|
| 130 |
|
|---|
| 131 |
**Required.** A method that returns a list of objects. The framework doesn't |
|---|
| 132 |
care what *type* of objects they are; all that matters is that these objects |
|---|
| 133 |
get passed to the ``location()``, ``lastmod()``, ``changefreq()`` and |
|---|
| 134 |
``priority()`` methods. |
|---|
| 135 |
|
|---|
| 136 |
``location`` |
|---|
| 137 |
------------ |
|---|
| 138 |
|
|---|
| 139 |
**Optional.** Either a method or attribute. |
|---|
| 140 |
|
|---|
| 141 |
If it's a method, it should return the absolute URL for a given object as |
|---|
| 142 |
returned by ``items()``. |
|---|
| 143 |
|
|---|
| 144 |
If it's an attribute, its value should be a string representing an absolute URL |
|---|
| 145 |
to use for *every* object returned by ``items()``. |
|---|
| 146 |
|
|---|
| 147 |
In both cases, "absolute URL" means a URL that doesn't include the protocol or |
|---|
| 148 |
domain. Examples: |
|---|
| 149 |
|
|---|
| 150 |
* Good: ``'/foo/bar/'`` |
|---|
| 151 |
* Bad: ``'example.com/foo/bar/'`` |
|---|
| 152 |
* Bad: ``'http://example.com/foo/bar/'`` |
|---|
| 153 |
|
|---|
| 154 |
If ``location`` isn't provided, the framework will call the |
|---|
| 155 |
``get_absolute_url()`` method on each object as returned by ``items()``. |
|---|
| 156 |
|
|---|
| 157 |
``lastmod`` |
|---|
| 158 |
----------- |
|---|
| 159 |
|
|---|
| 160 |
**Optional.** Either a method or attribute. |
|---|
| 161 |
|
|---|
| 162 |
If it's a method, it should take one argument -- an object as returned by |
|---|
| 163 |
``items()`` -- and return that object's last-modified date/time, as a Python |
|---|
| 164 |
``datetime.datetime`` object. |
|---|
| 165 |
|
|---|
| 166 |
If it's an attribute, its value should be a Python ``datetime.datetime`` object |
|---|
| 167 |
representing the last-modified date/time for *every* object returned by |
|---|
| 168 |
``items()``. |
|---|
| 169 |
|
|---|
| 170 |
``changefreq`` |
|---|
| 171 |
-------------- |
|---|
| 172 |
|
|---|
| 173 |
**Optional.** Either a method or attribute. |
|---|
| 174 |
|
|---|
| 175 |
If it's a method, it should take one argument -- an object as returned by |
|---|
| 176 |
``items()`` -- and return that object's change frequency, as a Python string. |
|---|
| 177 |
|
|---|
| 178 |
If it's an attribute, its value should be a string representing the change |
|---|
| 179 |
frequency of *every* object returned by ``items()``. |
|---|
| 180 |
|
|---|
| 181 |
Possible values for ``changefreq``, whether you use a method or attribute, are: |
|---|
| 182 |
|
|---|
| 183 |
* ``'always'`` |
|---|
| 184 |
* ``'hourly'`` |
|---|
| 185 |
* ``'daily'`` |
|---|
| 186 |
* ``'weekly'`` |
|---|
| 187 |
* ``'monthly'`` |
|---|
| 188 |
* ``'yearly'`` |
|---|
| 189 |
* ``'never'`` |
|---|
| 190 |
|
|---|
| 191 |
``priority`` |
|---|
| 192 |
------------ |
|---|
| 193 |
|
|---|
| 194 |
**Optional.** Either a method or attribute. |
|---|
| 195 |
|
|---|
| 196 |
If it's a method, it should take one argument -- an object as returned by |
|---|
| 197 |
``items()`` -- and return that object's priority, as either a string or float. |
|---|
| 198 |
|
|---|
| 199 |
If it's an attribute, its value should be either a string or float representing |
|---|
| 200 |
the priority of *every* object returned by ``items()``. |
|---|
| 201 |
|
|---|
| 202 |
Example values for ``priority``: ``0.4``, ``1.0``. The default priority of a |
|---|
| 203 |
page is ``0.5``. See the `sitemaps.org documentation`_ for more. |
|---|
| 204 |
|
|---|
| 205 |
.. _sitemaps.org documentation: http://www.sitemaps.org/protocol.html#prioritydef |
|---|
| 206 |
|
|---|
| 207 |
Shortcuts |
|---|
| 208 |
========= |
|---|
| 209 |
|
|---|
| 210 |
The sitemap framework provides a couple convenience classes for common cases: |
|---|
| 211 |
|
|---|
| 212 |
``FlatPageSitemap`` |
|---|
| 213 |
------------------- |
|---|
| 214 |
|
|---|
| 215 |
The ``django.contrib.sitemaps.FlatPageSitemap`` class looks at all flatpages_ |
|---|
| 216 |
defined for the current ``SITE_ID`` (see the `sites documentation`_) and |
|---|
| 217 |
creates an entry in the sitemap. These entries include only the ``location`` |
|---|
| 218 |
attribute -- not ``lastmod``, ``changefreq`` or ``priority``. |
|---|
| 219 |
|
|---|
| 220 |
.. _flatpages: ../flatpages/ |
|---|
| 221 |
.. _sites documentation: ../sites/ |
|---|
| 222 |
|
|---|
| 223 |
``GenericSitemap`` |
|---|
| 224 |
------------------ |
|---|
| 225 |
|
|---|
| 226 |
The ``GenericSitemap`` class works with any `generic views`_ you already have. |
|---|
| 227 |
To use it, create an instance, passing in the same ``info_dict`` you pass to |
|---|
| 228 |
the generic views. The only requirement is that the dictionary have a |
|---|
| 229 |
``queryset`` entry. It may also have a ``date_field`` entry that specifies a |
|---|
| 230 |
date field for objects retrieved from the ``queryset``. This will be used for |
|---|
| 231 |
the ``lastmod`` attribute in the generated sitemap. You may also pass |
|---|
| 232 |
``priority`` and ``changefreq`` keyword arguments to the ``GenericSitemap`` |
|---|
| 233 |
constructor to specify these attributes for all URLs. |
|---|
| 234 |
|
|---|
| 235 |
.. _generic views: ../generic_views/ |
|---|
| 236 |
|
|---|
| 237 |
Example |
|---|
| 238 |
------- |
|---|
| 239 |
|
|---|
| 240 |
Here's an example of a URLconf_ using both:: |
|---|
| 241 |
|
|---|
| 242 |
from django.conf.urls.defaults import * |
|---|
| 243 |
from django.contrib.sitemaps import FlatPageSitemap, GenericSitemap |
|---|
| 244 |
from mysite.blog.models import Entry |
|---|
| 245 |
|
|---|
| 246 |
info_dict = { |
|---|
| 247 |
'queryset': Entry.objects.all(), |
|---|
| 248 |
'date_field': 'pub_date', |
|---|
| 249 |
} |
|---|
| 250 |
|
|---|
| 251 |
sitemaps = { |
|---|
| 252 |
'flatpages': FlatPageSitemap, |
|---|
| 253 |
'blog': GenericSitemap(info_dict, priority=0.6), |
|---|
| 254 |
} |
|---|
| 255 |
|
|---|
| 256 |
urlpatterns = patterns('', |
|---|
| 257 |
# some generic view using info_dict |
|---|
| 258 |
# ... |
|---|
| 259 |
|
|---|
| 260 |
# the sitemap |
|---|
| 261 |
(r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) |
|---|
| 262 |
) |
|---|
| 263 |
|
|---|
| 264 |
.. _URLconf: ../url_dispatch/ |
|---|
| 265 |
|
|---|
| 266 |
Creating a sitemap index |
|---|
| 267 |
======================== |
|---|
| 268 |
|
|---|
| 269 |
The sitemap framework also has the ability to create a sitemap index that |
|---|
| 270 |
references individual sitemap files, one per each section defined in your |
|---|
| 271 |
``sitemaps`` dictionary. The only differences in usage are: |
|---|
| 272 |
|
|---|
| 273 |
* You use two views in your URLconf: ``django.contrib.sitemaps.views.index`` |
|---|
| 274 |
and ``django.contrib.sitemaps.views.sitemap``. |
|---|
| 275 |
* The ``django.contrib.sitemaps.views.sitemap`` view should take a |
|---|
| 276 |
``section`` keyword argument. |
|---|
| 277 |
|
|---|
| 278 |
Here is what the relevant URLconf lines would look like for the example above:: |
|---|
| 279 |
|
|---|
| 280 |
(r'^sitemap.xml$', 'django.contrib.sitemaps.views.index', {'sitemaps': sitemaps}) |
|---|
| 281 |
(r'^sitemap-(?P<section>.+).xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps}) |
|---|
| 282 |
|
|---|
| 283 |
This will automatically generate a ``sitemap.xml`` file that references |
|---|
| 284 |
both ``sitemap-flatpages.xml`` and ``sitemap-blog.xml``. The ``Sitemap`` |
|---|
| 285 |
classes and the ``sitemaps`` dict don't change at all. |
|---|
| 286 |
|
|---|
| 287 |
Pinging Google |
|---|
| 288 |
============== |
|---|
| 289 |
|
|---|
| 290 |
You may want to "ping" Google when your sitemap changes, to let it know to |
|---|
| 291 |
reindex your site. The framework provides a function to do just that: |
|---|
| 292 |
``django.contrib.sitemaps.ping_google()``. |
|---|
| 293 |
|
|---|
| 294 |
``ping_google()`` takes an optional argument, ``sitemap_url``, which should be |
|---|
| 295 |
the absolute URL of your site's sitemap (e.g., ``'/sitemap.xml'``). If this |
|---|
| 296 |
argument isn't provided, ``ping_google()`` will attempt to figure out your |
|---|
| 297 |
sitemap by performing a reverse looking in your URLconf. |
|---|
| 298 |
|
|---|
| 299 |
``ping_google()`` raises the exception |
|---|
| 300 |
``django.contrib.sitemaps.SitemapNotFound`` if it cannot determine your sitemap |
|---|
| 301 |
URL. |
|---|
| 302 |
|
|---|
| 303 |
One useful way to call ``ping_google()`` is from a model's ``save()`` method:: |
|---|
| 304 |
|
|---|
| 305 |
from django.contrib.sitemaps import ping_google |
|---|
| 306 |
|
|---|
| 307 |
class Entry(models.Model): |
|---|
| 308 |
# ... |
|---|
| 309 |
def save(self): |
|---|
| 310 |
super(Entry, self).save() |
|---|
| 311 |
try: |
|---|
| 312 |
ping_google() |
|---|
| 313 |
except Exception: |
|---|
| 314 |
# Bare 'except' because we could get a variety |
|---|
| 315 |
# of HTTP-related exceptions. |
|---|
| 316 |
pass |
|---|
| 317 |
|
|---|
| 318 |
A more efficient solution, however, would be to call ``ping_google()`` from a |
|---|
| 319 |
cron script, or some other scheduled task. The function makes an HTTP request |
|---|
| 320 |
to Google's servers, so you may not want to introduce that network overhead |
|---|
| 321 |
each time you call ``save()``. |
|---|