Opened 18 years ago

Closed 18 years ago

Last modified 15 years ago

#580 closed defect (fixed)

[patch] caching needs to take more into account than just the URL

Reported by: hugo Owned by: Adrian Holovaty
Component: Core (Cache system) Version:
Severity: normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

currently the cache key for the caching middleware and the caching decorator takes into account the prefix, the URL and wether gzip is required or not. This needs to be extended to take into account the cookies and the Http-accept-language header (for i18n). The reason is that programmers will base content of views on the content of cookies or session data (that itself is referenced as the session id by some cookie) and so the caching middleware would deliver a personalized page to some different user.

The attached patch works in that it builds the cache key out of the URL and some HTTP headers by creating a md5 hexdigest. There is one problem with this: you get less false cache hits, but you get less cache hits in all: if two users would see the same page because it's content isn't cookie based, but have different cookie settings nonetheless, they will occupy two cache slots. But the current solution isn't right, so maybe we have to live with the less cache hits - it's a standard problem of caching, anyway.

This is similar to the HTTP caching like is done by squid: there you have to mark your pages with the Varying header to define what HTTP headers should be taken into account when building the cache key. So maybe a more involved solution might be to require the programmer to set attributes on view functions that define the HTTP headers to take into account when caching. That way it's up to the programmer to define wether his view is based on cookie data or i18n language selection.

This more involved version could accumulate the cache-key relevant headers in the request - that way middleware can add headers to be taken into account and so the cache key creation will be fully based on both view functions and middleware.

Attachments (2)

cachekey.diff (3.6 KB ) - added by hugo 18 years ago.
cache key creation that is based on some http headers
cache-vary.diff (26.2 KB ) - added by hugo 18 years ago.
new cache patch based on Vary header (with rework of middlewares)

Download all attachments as: .zip

Change History (13)

by hugo, 18 years ago

Attachment: cachekey.diff added

cache key creation that is based on some http headers

comment:1 by C8E, 18 years ago

hugo wrote:

So maybe a more involved solution might be to require the programmer to set attributes 
on view functions that define the HTTP headers to take into account when caching. 
That way it's up to the programmer to define wether his view is based on cookie data 
or i18n language selection.

What about reversing the assumption? I.E. assume that the page depends on headers and cookies and i18n, and permit the developer to profile the page's caching policy if this it's not the case? It seems safier and less surprising to me.

comment:2 by hugo <gb@…>, 18 years ago

Hmm. The more I think about it, the more I think something like a cache policy object would be what we need. Think of a cache policy object as an object that encapsulates the knowledge about how to create the cache key from the request. The user installs the caching middleware and installs (in the settings file) a base cache policy. This can be either a "depends all" policy or a "depends none" policy (and I think taking the latter as the default would be ok - that way django would behave exactly the same as now). The cache middleware would use the installed cache policy to create the cache key from the request. Installed middleware can modify the cache policy by adding or removing headers from the policy - that way the i18n middleware would add the accept-language header to the policy and so force even the "depends none" policy to now check for that header (the depends all policy already will use that header anyway).

Additionally we should make the cache decorator more intelligent by giving it a way to modify the cache policy for this view function - something like a merge of policies. So that if the programmer took a "depends none" base policy, a view can add headers to it and if the programmer took the "depends all" policy, a view could remove headers from it. If the header to be added is already present, nothing would happen. If the header to be remove isn't present, nothing happens. Otherwise the policy will be modified for the cache key discovery. View functions modifications to the cache policy shouldn't be permanent, of course, while middleware modifications should be permanent.

The implementation of the cache policy object is actually quite simple: it's just a set of headers to check and the method get_cache_key that just uses the intersection of given headers from the request and defined headers to build the cache key. Policy modifications are just adding and removing header names to and from the set of defined headers. View cache policy modification is just building the union of defined headers and viewfunc added headers and then removing the viewfunc removed headers before doing the intersection with given headers from the request.

comment:3 by hugo, 18 years ago

This middleware by Sune Kirkeby might be a solution, too.

comment:4 by Sune Kirkeby <sune.kirkeby@…>, 18 years ago

FYI: The cache-middleware I wrote uses the HTTP-reply Vary-header to generate the cache-key from, so in effect the Vary-header is the "policy object." This has the advantage of working with other HTTP-caches (e.g. you could put a squid-cache in front of Django, and it should work (at least I think it should :).)

comment:5 by hugo, 18 years ago

Actually after some talk to him in IRC I think it would be a good idea to adopt his way on this and import his three middlewares into the core instead of the current caching middleware. He split the caching middleware into three distinct ones - one doing the caching, one for gzip handling and one for conditional GET and HTTP header stuff. That way it's easier for admins to configure a project to use caching but not use gzip (or the other way around) or not to use both, but still support conditional GET.

The idea on caching would be to go his route and just take the Vary-header from the Response into account and build the cache key based on the URL and the Request-Headers that are listed in the Vary-Header of the response. The same should be done with the cache decorator - just take the Vary-header and pull all listed request-headers out and use the for the cache key (like I do in my patch - but I have constant headers in there).

Middlewares that add dependencies on new HTTP headers can just modify the Vary-header and add the headers they depend on - like the GZIP middleware can add the accept-encoding header if it isn't listed in the Vary-header and my i18n middleware can add the accept-language header the same way. That way the programmer has full control about what headers go into the cache key and middlewares can automatically modify the cache behaviour. And as a nice side effect we send out perfect headers for additional caches that might be in front of our applications, like for example Squids or other proxies.

comment:6 by hugo, 18 years ago

Owner: changed from Jacob to hugo

I have a patch there for this, I just need to do a bit of testing against it. It will split the CacheMiddleware and the cache decorator into three functionalities: the cache stuff, the gzip stuff and the conditional get stuff. That way users can decide what part to pull in. All caching will be based on the Vary header and there will be helper functions and decorators to make it simple to patch that Vary header accordingly.

In a simple case, a function might look like this:

from django.utils.cache import vary_on_cookie
@vary_on_cookie
def index(request):
   ...
   return render_to_response('template', context)

This would define the function to be based in content on the cookies. Of course middlewares like the SessionMiddleware or the GzipMiddleware (or later my I18NMiddleware) will automatically extend the Vary header so the programmer doesn't need to care for it.

The patch will update the documentation, too.

by hugo, 18 years ago

Attachment: cache-vary.diff added

new cache patch based on Vary header (with rework of middlewares)

comment:7 by hugo, 18 years ago

I uploaded the patch that reworks the cache middleware and decorator and splits them into three distinct ones. Additionally now it uses the Vary header. This is mostly based on Sunes work, but I had to rework the cache middleware a bit. My tests worked fine against the locmem cache in a test setting. This is backward compatible in that the same settings file will continue to work, but won't give the same features - conditional GET and gzip encoding are done by different middlewares or decorators, with a settings file that uses the cache stuff you will only get the caching, not those other functionalities - you need to make changes to your source to get them (add the middleware or the decorators).

comment:8 by Adrian Holovaty, 18 years ago

Owner: changed from hugo to Adrian Holovaty
Status: newassigned

comment:9 by Adrian Holovaty, 18 years ago

(In [807]) Added django.utils.decorators, from Hugo's #580 patch. Refs #580.

comment:10 by Adrian Holovaty, 18 years ago

(In [808]) Added django.utils.cache, from Hugo's #580 patch. Refs #580.

comment:11 by Adrian Holovaty, 18 years ago

Resolution: fixed
Status: assignedclosed

(In [810]) Fixed #580 -- Added mega support for generating Vary headers, including some view decorators, and changed the CacheMiddleware to account for the Vary header. Also added GZipMiddleware and ConditionalGetMiddleware, which are no longer handled by CacheMiddleware itself. Also updated the cache.txt and middleware.txt docs. Thanks to Hugo and Sune for the excellent patches

Note: See TracTickets for help on using tickets.
Back to Top