Opened 8 years ago

Closed 5 years ago

#10793 closed Bug (fixed)

Sitemap implementation means it cannot be updated automatically

Reported by: tenkujin Owned by: Grzegorz Nosek
Component: contrib.sitemaps Version: 1.4-alpha-1
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The Sitemap class uses an attribute _paginator which stores all the URLs. The method items() is thus called only once for the creation of the paginator, and subsequent invocations of the sitemap view do not call it. If the items() method is meant to return dynamically changing items (which is a common case), changes are not reflected in the sitemap. The only way to fix this is to either restart Django (not acceptable), override the get_urls() method and update the _paginator (not nice), or re-implement the Sitemap object to not have such semantics (violates DRY).

Since this paginator is a hack to accommodate Google's arbitrary 50k URL limit, I think it would be best to have a Sitemap which can be updated at runtime, and an optional GoogleSitemap for this particular search engine.

Attachments (4)

sitemaps.patch (839 bytes) - added by Denis Martinez <deuns.martinez@…> 7 years ago.
proposed solution for dynamic sitemaps
paginator.diff (1.7 KB) - added by Christopher Medrela 5 years ago.
Dynamic paginator creation, checking for DB changes and time limit
paginator_doc.diff (3.0 KB) - added by Christopher Medrela 5 years ago.
Added documentation to previous patch, moved checking code to separate method.
10793.diff (1.8 KB) - added by Grzegorz Nosek 5 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 8 years ago by Julian Bez

"subsequent invocations of the sitemap view do not call it"

Are you sure? AFAIK Django treats every request as a completely new one.

comment:2 Changed 8 years ago by Arthur Koziel

The 50k limit is not arbitrary. Google, like MSN and Yahoo, use the sitemaps.org standard which specifies the 50k limit.

See http://sitemaps.org/protocol.php#index for more Information.

comment:3 in reply to:  2 Changed 8 years ago by tenkujin

Replying to arthurk:

The 50k limit is not arbitrary. Google, like MSN and Yahoo, use the sitemaps.org standard which specifies the 50k limit.

See http://sitemaps.org/protocol.php#index for more Information.

Still, by storing the paginator in the self._paginator field and not providing a way to clear it (signals? cache? method?) means we need to fiddle with the internals of Django sitemaps simply to have dynamic sitemaps. I think the pagination logic should somehow check to see whether the query set it uses has changed since the last time the paginator was created and assigned to _paginator

Changed 7 years ago by Denis Martinez <deuns.martinez@…>

Attachment: sitemaps.patch added

proposed solution for dynamic sitemaps

comment:4 Changed 7 years ago by Denis Martinez <deuns.martinez@…>

Here's a solution to the problem, which I'm currently using on my website.
I applied the above patch to django 1.1 and I modified urls.py to look like this:

urlpatterns = patterns(
    '',
    url(r'^sitemap.xml$', 'django.contrib.sitemaps.views.index', {'sitemaps': get_sitemaps}, name='sitemap_index'),
    url(r'^sitemap-(?P<section>.+)\.xml$', 'django.contrib.sitemaps.views.sitemap', {'sitemaps': get_sitemaps}, name='sitemap'),
    )

where get_sitemaps is a function which returns the sitemaps directory.

def get_sitemaps():
    sitemaps = {}
    sitemaps['blogs'] = BlogSitemap
    ......
    return sitemaps

Using this method my sitemaps are instanciated on every request, and I get a new paginator, and expected results.

comment:5 Changed 7 years ago by Greg Thornton

The approach in sitemaps.patch invalidates the current documentation, and doesn't take into consideration the fact that an previously instantiated Sitemap instance may be used instead of a subclass. I don't know if the pagination should really be handled by the Sitemap class at all, since it seems like a more view specific attribute. Since ditching the class based pagination would require a lot of view changes, I've been using a subclass of Sitemap whose paginator attribute is instantiated each time it is accessed. When using the current sitemap views, the paginator will only be accessed once, so only one paginator is initialized for each request (luckily).

For reference, I believe this bug only shows up if you add a Sitemap instance to the sitemaps dict. Using classes only shouldn't be a problem since the short-lived instance will be recreated each time the view is called.

Should we dumb down Sitemap and move pagination into the views? It seems like that would be more flexible.

comment:6 Changed 7 years ago by Jacob

Triage Stage: UnreviewedAccepted

comment:7 Changed 6 years ago by Gabriel Hurley

Component: Contrib appscontrib.sitemaps

comment:8 Changed 6 years ago by Chris Beaven

Severity: Normal
Type: Bug

Changed 5 years ago by Christopher Medrela

Attachment: paginator.diff added

Dynamic paginator creation, checking for DB changes and time limit

comment:9 Changed 5 years ago by Christopher Medrela

Easy pickings: unset
Has patch: set
UI/UX: unset

We check time to life of paginator. If it is older than 10 minutes (Sitemap.reload_delta), we force recreation of new paginator. If it is older than 1 minute (Sitemap.counter_delta) we check whether number of items in DB has changed and recreate paginator if so.

comment:10 Changed 5 years ago by Christopher Medrela

Version: 1.01.4-alpha-1

Changed 5 years ago by Christopher Medrela

Attachment: paginator_doc.diff added

Added documentation to previous patch, moved checking code to separate method.

comment:11 Changed 5 years ago by Tomek Paczkowski

Needs tests: set

comment:12 Changed 5 years ago by Grzegorz Nosek

Owner: changed from nobody to Grzegorz Nosek
Status: newassigned

Changed 5 years ago by Grzegorz Nosek

Attachment: 10793.diff added

comment:13 Changed 5 years ago by adam_przybyla

Needs tests: unset
Triage Stage: AcceptedReady for checkin

comment:14 Changed 5 years ago by Grzegorz Nosek

The Paginator object is needlessly cached for the whole lifetime of the Sitemap object, which may be the whole lifetime of the server.

After discussing with jezdez, we considered the potential performance impact (evaluating len(sitemap.items()) at every request acceptable for the time being. Fixing this properly would require Paginators to lazily determine the number of pages only when actually needed, which is quite outside of the scope of this ticket.

comment:15 Changed 5 years ago by Jannis Leidel

Resolution: fixed
Status: assignedclosed

In [17468]:

Fixed #10793 -- Stopped caching paginator instances in sitemap classes to prevent stale sitemaps. Thanks, gnosek, krzysiumed and adam_przybyla.

Note: See TracTickets for help on using tickets.
Back to Top