1 | =====================
|
---|
2 | The sitemap framework
|
---|
3 | =====================
|
---|
4 |
|
---|
5 | Django comes with a high-level sitemap-generating framework that makes
|
---|
6 | creating sitemap_ XML files easy.
|
---|
7 |
|
---|
8 | .. _sitemap: http://www.sitemaps.org/
|
---|
9 |
|
---|
10 | Overview
|
---|
11 | ========
|
---|
12 |
|
---|
13 | A sitemap is an XML file on your Web site that tells search-engine indexers how
|
---|
14 | frequently your pages change and how "important" certain pages are in relation
|
---|
15 | to other pages on your site. This information helps search engines index your
|
---|
16 | site.
|
---|
17 |
|
---|
18 | The Django sitemap framework automates the creation of this XML file by letting
|
---|
19 | you express this information in Python code.
|
---|
20 |
|
---|
21 | It works much like Django's `syndication framework`_. To create a sitemap, just
|
---|
22 | write a ``Sitemap`` class and point to it in your URLconf_.
|
---|
23 |
|
---|
24 | .. _syndication framework: ../syndication_feeds/
|
---|
25 | .. _URLconf: ../url_dispatch/
|
---|
26 |
|
---|
27 | Installation
|
---|
28 | ============
|
---|
29 |
|
---|
30 | To install the sitemap app, follow these steps:
|
---|
31 |
|
---|
32 | 1. Add ``'django.contrib.sitemaps'`` to your INSTALLED_APPS_ setting.
|
---|
33 | 2. Make sure ``'django.template.loaders.app_directories.load_template_source'``
|
---|
34 | is in your TEMPLATE_LOADERS_ setting. It's in there by default, so
|
---|
35 | you'll only need to change this if you've changed that setting.
|
---|
36 | 3. Make sure you've installed the `sites framework`_.
|
---|
37 |
|
---|
38 | (Note: The sitemap application doesn't install any database tables. The only
|
---|
39 | reason it needs to go into ``INSTALLED_APPS`` is so that the
|
---|
40 | ``load_template_source`` template loader can find the default templates.)
|
---|
41 |
|
---|
42 | .. _INSTALLED_APPS: ../settings/#installed-apps
|
---|
43 | .. _TEMPLATE_LOADERS: ../settings/#template-loaders
|
---|
44 | .. _sites framework: ../sites/
|
---|
45 |
|
---|
46 | Initialization
|
---|
47 | ==============
|
---|
48 |
|
---|
49 | To activate sitemap generation on your Django site, add this line to your
|
---|
50 | URLconf_::
|
---|
51 |
|
---|
52 | (r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps})
|
---|
53 |
|
---|
54 | This tells Django to build a sitemap when a client accesses ``/sitemap.xml``.
|
---|
55 |
|
---|
56 | The name of the sitemap file is not important, but the location is. Search
|
---|
57 | engines will only index links in your sitemap for the current URL level and
|
---|
58 | below. For instance, if ``sitemap.xml`` lives in your root directory, it may
|
---|
59 | reference any URL in your site. However, if your sitemap lives at
|
---|
60 | ``/content/sitemap.xml``, it may only reference URLs that begin with
|
---|
61 | ``/content/``.
|
---|
62 |
|
---|
63 | The sitemap view takes an extra, required argument: ``{'sitemaps': sitemaps}``.
|
---|
64 | ``sitemaps`` should be a dictionary that maps a short section label (e.g.,
|
---|
65 | ``blog`` or ``news``) to its ``Sitemap`` class (e.g., ``BlogSitemap`` or
|
---|
66 | ``NewsSitemap``). It may also map to an *instance* of a ``Sitemap`` class
|
---|
67 | (e.g., ``BlogSitemap(some_var)``).
|
---|
68 |
|
---|
69 | .. _URLconf: ../url_dispatch/
|
---|
70 |
|
---|
71 | Sitemap classes
|
---|
72 | ===============
|
---|
73 |
|
---|
74 | A ``Sitemap`` class is a simple Python class that represents a "section" of
|
---|
75 | entries in your sitemap. For example, one ``Sitemap`` class could represent all
|
---|
76 | the entries of your weblog, while another could represent all of the events in
|
---|
77 | your events calendar.
|
---|
78 |
|
---|
79 | In the simplest case, all these sections get lumped together into one
|
---|
80 | ``sitemap.xml``, but it's also possible to use the framework to generate a
|
---|
81 | sitemap index that references individual sitemap files, one per section. (See
|
---|
82 | `Creating a sitemap index`_ below.)
|
---|
83 |
|
---|
84 | ``Sitemap`` classes must subclass ``django.contrib.sitemaps.Sitemap``. They can
|
---|
85 | live anywhere in your codebase.
|
---|
86 |
|
---|
87 | A simple example
|
---|
88 | ================
|
---|
89 |
|
---|
90 | Let's assume you have a blog system, with an ``Entry`` model, and you want your
|
---|
91 | sitemap to include all the links to your individual blog entries. Here's how
|
---|
92 | your sitemap class might look::
|
---|
93 |
|
---|
94 | from django.contrib.sitemaps import Sitemap
|
---|
95 | from mysite.blog.models import Entry
|
---|
96 |
|
---|
97 | class BlogSitemap(Sitemap):
|
---|
98 | changefreq = "never"
|
---|
99 | priority = 0.5
|
---|
100 |
|
---|
101 | def items(self):
|
---|
102 | return Entry.objects.filter(is_draft=False)
|
---|
103 |
|
---|
104 | def lastmod(self, obj):
|
---|
105 | return obj.pub_date
|
---|
106 |
|
---|
107 | Note:
|
---|
108 |
|
---|
109 | * ``changefreq`` and ``priority`` are class attributes corresponding to
|
---|
110 | ``<changefreq>`` and ``<priority>`` elements, respectively. They can be
|
---|
111 | made callable as functions, as ``lastmod`` was in the example.
|
---|
112 | * ``items()`` is simply a method that returns a list of objects. The objects
|
---|
113 | returned will get passed to any callable methods corresponding to a
|
---|
114 | sitemap property (``location``, ``lastmod``, ``changefreq``, and
|
---|
115 | ``priority``).
|
---|
116 | * ``lastmod`` should return a Python ``datetime`` object.
|
---|
117 | * There is no ``location`` method in this example, but you can provide it
|
---|
118 | in order to specify the URL for your object. By default, ``location()``
|
---|
119 | calls ``get_absolute_url()`` on each object and returns the result.
|
---|
120 |
|
---|
121 | Sitemap class reference
|
---|
122 | =======================
|
---|
123 |
|
---|
124 | A ``Sitemap`` class can define the following methods/attributes:
|
---|
125 |
|
---|
126 | ``items``
|
---|
127 | ---------
|
---|
128 |
|
---|
129 | **Required.** A method that returns a list of objects. The framework doesn't
|
---|
130 | care what *type* of objects they are; all that matters is that these objects
|
---|
131 | get passed to the ``location()``, ``lastmod()``, ``changefreq()`` and
|
---|
132 | ``priority()`` methods.
|
---|
133 |
|
---|
134 | ``location``
|
---|
135 | ------------
|
---|
136 |
|
---|
137 | **Optional.** Either a method or attribute.
|
---|
138 |
|
---|
139 | If it's a method, it should return the absolute URL for a given object as
|
---|
140 | returned by ``items()``.
|
---|
141 |
|
---|
142 | If it's an attribute, its value should be a string representing an absolute URL
|
---|
143 | to use for *every* object returned by ``items()``.
|
---|
144 |
|
---|
145 | In both cases, "absolute URL" means a URL that doesn't include the protocol or
|
---|
146 | domain. Examples:
|
---|
147 |
|
---|
148 | * Good: ``'/foo/bar/'``
|
---|
149 | * Bad: ``'example.com/foo/bar/'``
|
---|
150 | * Bad: ``'http://example.com/foo/bar/'``
|
---|
151 |
|
---|
152 | If ``location`` isn't provided, the framework will call the
|
---|
153 | ``get_absolute_url()`` method on each object as returned by ``items()``.
|
---|
154 |
|
---|
155 | ``lastmod``
|
---|
156 | -----------
|
---|
157 |
|
---|
158 | **Optional.** Either a method or attribute.
|
---|
159 |
|
---|
160 | If it's a method, it should take one argument -- an object as returned by
|
---|
161 | ``items()`` -- and return that object's last-modified date/time, as a Python
|
---|
162 | ``datetime.datetime`` object.
|
---|
163 |
|
---|
164 | If it's an attribute, its value should be a Python ``datetime.datetime`` object
|
---|
165 | representing the last-modified date/time for *every* object returned by
|
---|
166 | ``items()``.
|
---|
167 |
|
---|
168 | ``changefreq``
|
---|
169 | --------------
|
---|
170 |
|
---|
171 | **Optional.** Either a method or attribute.
|
---|
172 |
|
---|
173 | If it's a method, it should take one argument -- an object as returned by
|
---|
174 | ``items()`` -- and return that object's change frequency, as a Python string.
|
---|
175 |
|
---|
176 | If it's an attribute, its value should be a string representing the change
|
---|
177 | frequency of *every* object returned by ``items()``.
|
---|
178 |
|
---|
179 | Possible values for ``changefreq``, whether you use a method or attribute, are:
|
---|
180 |
|
---|
181 | * ``'always'``
|
---|
182 | * ``'hourly'``
|
---|
183 | * ``'daily'``
|
---|
184 | * ``'weekly'``
|
---|
185 | * ``'monthly'``
|
---|
186 | * ``'yearly'``
|
---|
187 | * ``'never'``
|
---|
188 |
|
---|
189 | ``priority``
|
---|
190 | ------------
|
---|
191 |
|
---|
192 | **Optional.** Either a method or attribute.
|
---|
193 |
|
---|
194 | If it's a method, it should take one argument -- an object as returned by
|
---|
195 | ``items()`` -- and return that object's priority, as either a string or float.
|
---|
196 |
|
---|
197 | If it's an attribute, its value should be either a string or float representing
|
---|
198 | the priority of *every* object returned by ``items()``.
|
---|
199 |
|
---|
200 | Example values for ``priority``: ``0.4``, ``1.0``. The default priority of a
|
---|
201 | page is ``0.5``. See the `sitemaps.org documentation`_ for more.
|
---|
202 |
|
---|
203 | .. _sitemaps.org documentation: http://www.sitemaps.org/protocol.html#prioritydef
|
---|
204 |
|
---|
205 | Shortcuts
|
---|
206 | =========
|
---|
207 |
|
---|
208 | The sitemap framework provides a couple convenience classes for common cases:
|
---|
209 |
|
---|
210 | ``FlatPageSitemap``
|
---|
211 | -------------------
|
---|
212 |
|
---|
213 | The ``django.contrib.sitemaps.FlatPageSitemap`` class looks at all flatpages_
|
---|
214 | defined for the current ``SITE_ID`` (see the `sites documentation`_) and
|
---|
215 | creates an entry in the sitemap. These entries include only the ``location``
|
---|
216 | attribute -- not ``lastmod``, ``changefreq`` or ``priority``.
|
---|
217 |
|
---|
218 | .. _flatpages: ../flatpages/
|
---|
219 | .. _sites documentation: ../sites/
|
---|
220 |
|
---|
221 | ``GenericSitemap``
|
---|
222 | ------------------
|
---|
223 |
|
---|
224 | The ``GenericSitemap`` class works with any `generic views`_ you already have.
|
---|
225 | To use it, create an instance, passing in the same ``info_dict`` you pass to
|
---|
226 | the generic views. The only requirement is that the dictionary have a
|
---|
227 | ``queryset`` entry. It may also have a ``date_field`` entry that specifies a
|
---|
228 | date field for objects retrieved from the ``queryset``. This will be used for
|
---|
229 | the ``lastmod`` attribute in the generated sitemap. You may also pass
|
---|
230 | ``priority`` and ``changefreq`` keyword arguments to the ``GenericSitemap``
|
---|
231 | constructor to specify these attributes for all URLs.
|
---|
232 |
|
---|
233 | .. _generic views: ../generic_views/
|
---|
234 |
|
---|
235 | Example
|
---|
236 | -------
|
---|
237 |
|
---|
238 | Here's an example of a URLconf_ using both::
|
---|
239 |
|
---|
240 | from django.conf.urls.defaults import *
|
---|
241 | from django.contrib.sitemaps import FlatPageSitemap, GenericSitemap
|
---|
242 | from mysite.blog.models import Entry
|
---|
243 |
|
---|
244 | info_dict = {
|
---|
245 | 'queryset': Entry.objects.all(),
|
---|
246 | 'date_field': 'pub_date',
|
---|
247 | }
|
---|
248 |
|
---|
249 | sitemaps = {
|
---|
250 | 'flatpages': FlatPageSitemap,
|
---|
251 | 'blog': GenericSitemap(info_dict, priority=0.6),
|
---|
252 | }
|
---|
253 |
|
---|
254 | urlpatterns = patterns('',
|
---|
255 | # some generic view using info_dict
|
---|
256 | # ...
|
---|
257 |
|
---|
258 | # the sitemap
|
---|
259 | (r'^sitemap.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps})
|
---|
260 | )
|
---|
261 |
|
---|
262 | .. _URLconf: ../url_dispatch/
|
---|
263 |
|
---|
264 | Creating a sitemap index
|
---|
265 | ========================
|
---|
266 |
|
---|
267 | The sitemap framework also has the ability to create a sitemap index that
|
---|
268 | references individual sitemap files, one per each section defined in your
|
---|
269 | ``sitemaps`` dictionary. The only differences in usage are:
|
---|
270 |
|
---|
271 | * You use two views in your URLconf: ``django.contrib.sitemaps.views.index``
|
---|
272 | and ``django.contrib.sitemaps.views.sitemap``.
|
---|
273 | * The ``django.contrib.sitemaps.views.sitemap`` view should take a
|
---|
274 | ``section`` keyword argument.
|
---|
275 |
|
---|
276 | Here is what the relevant URLconf lines would look like for the example above::
|
---|
277 |
|
---|
278 | (r'^sitemap.xml$', 'django.contrib.sitemaps.views.index', {'sitemaps': sitemaps})
|
---|
279 | (r'^sitemap-(?P<section>.+).xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': sitemaps})
|
---|
280 |
|
---|
281 | This will automatically generate a ``sitemap.xml`` file that references
|
---|
282 | both ``sitemap-flatpages.xml`` and ``sitemap-blog.xml``. The ``Sitemap``
|
---|
283 | classes and the ``sitemaps`` dict don't change at all.
|
---|
284 |
|
---|
285 | If one of your sitemaps is going to have more than 50,000 URLs you should
|
---|
286 | create an index file. Your sitemap will be paginated and the index will
|
---|
287 | reflect that.
|
---|
288 |
|
---|
289 | Pinging Google
|
---|
290 | ==============
|
---|
291 |
|
---|
292 | After you have initially submitted your sitemap to Google's Webmaster Tools,
|
---|
293 | you may want to "ping" Google when your sitemap changes. This will tell them
|
---|
294 | to reindex your site. The framework provides a function to do just that:
|
---|
295 | ``django.contrib.sitemaps.ping_google()``.
|
---|
296 |
|
---|
297 | ``ping_google()`` takes an optional argument, ``sitemap_url``, which should be
|
---|
298 | the absolute URL of your site's sitemap (e.g., ``'/sitemap.xml'``). If this
|
---|
299 | argument isn't provided, ``ping_google()`` will attempt to figure out your
|
---|
300 | sitemap by performing a reverse looking in your URLconf.
|
---|
301 |
|
---|
302 | ``ping_google()`` raises the exception
|
---|
303 | ``django.contrib.sitemaps.SitemapNotFound`` if it cannot determine your sitemap
|
---|
304 | URL.
|
---|
305 |
|
---|
306 | One useful way to call ``ping_google()`` is from a model's ``save()`` method::
|
---|
307 |
|
---|
308 | from django.contrib.sitemaps import ping_google
|
---|
309 |
|
---|
310 | class Entry(models.Model):
|
---|
311 | # ...
|
---|
312 | def save(self):
|
---|
313 | super(Entry, self).save()
|
---|
314 | try:
|
---|
315 | ping_google()
|
---|
316 | except Exception:
|
---|
317 | # Bare 'except' because we could get a variety
|
---|
318 | # of HTTP-related exceptions.
|
---|
319 | pass
|
---|
320 |
|
---|
321 | A more efficient solution, however, would be to call ``ping_google()`` from a
|
---|
322 | cron script, or some other scheduled task. The function makes an HTTP request
|
---|
323 | to Google's servers, so you may not want to introduce that network overhead
|
---|
324 | each time you call ``save()``.
|
---|
325 |
|
---|
326 | Pinging Google via `manage.py`
|
---|
327 | ------------------------------
|
---|
328 |
|
---|
329 | **New in Django development version**
|
---|
330 |
|
---|
331 | Once the sitemaps application is added to your project, you may also
|
---|
332 | ping the Google server's through the command line manage.py interface::
|
---|
333 |
|
---|
334 | python manage.py ping_google [/sitemap.xml]
|
---|
335 |
|
---|