| Version 5 (modified by , 18 years ago) ( diff ) |
|---|
Abstract
This addition to Django's ORM adds simple drop-in caching, compatible with
nearly all existing QuerySet methods. It emphasizes
performance and compatibility, and providing configuration options with sane
defaults. All that is required for basic functionality is a suitable
CACHE_BACKEND setting and the addition of .cache() to the appropriate
QuerySet chains. It also speeds up the lookup of related objects, and even
that of generic relations.
The project and svn repository can be found at Google Project Hosting.
Proposed Design
The QuerySet class grows new methods to add object caching:
.cache()
cache(timeout=None, prefix='qscache:', smart=False)
This method causes models instances found in the returned
QuerySetto be cached individually; the cache key is calculated using the contrib.contenttypes model id and the instance's pk value. (This is all done lazily and the position ofcache()does not matter, to be consistent with other methods.)
timeoutdefaults to the amount specified inCACHE_BACKEND.prefixis in addition toCACHE_MIDDLEWARE_KEY_PREFIX.
Internally,
QuerySetgrows some new attributes that affect how SQL is generated. Use ofcache()causes the query to retrieve only primary keys of selected objects.in_bulk()uses the cache directly, although cache misses will still require database hits, as usual. Methods such asdelete()andcount()are largely unaffected bycache(), but methods such asdistinct()are a more difficult case and will require some design decisions. Usingextra(select=...)is also a possibly unsolvable case.
If
values()has been used in the query,cache()takes precedence and creates the values dictionary from cache. If a list of fields is specified invalues(),cache()will still perform the equivalent of aSELECT *.
select_related()is supported by the caching mechanism. The appropriate joins are still performed by the database; if joins were calculated with cached foreign key values, cache misses could become very costly.
.cache_related()
cache_related(fields, timeout=None, prefix='qscache:', smart=False)
fieldsis a name or list of foreign keys, many-to-many/one-to-one fields, reverse-lookup fields, or generic foreign keys on the model. Model instances pointed to by the given relation will be cached similarly tocache().
I'm not sold on the signature of this method... *args would be nice but then the other defaulted arguments would be replaced by kwargs.
Also, the special string
'*'could be accepted to cache all relations. Either that, or it is implied by the lack of afieldsargument?
Aside
Without database-specific trickery it is non-trivial to perform SQL JOINs with generic relations. Currently, a database query is required for each generic foreign key relationship. The cache framework, while unable to reduce the initial number of database hits, greatly alleviates load when lists of generic objects are required. Using this method still loads generic foreign keys lazily, but more quickly, and also uses objects cached with
cache().
.cache_set()
cache_set(cache_key, timeout=None, smart=False, depth=1)
Similar to taking the resulting QuerySet and storing it directly in the cache. Overrides
cache(), but does not cache relations.
If
select_related()is used in the sameQuerySet,cache_set()will also cache relations as far as theselect_related()'s joins reach.
If
cache_related()is used in the sameQuerySet, it overrides use ofselect_related().
Sample usage
>>> article.comment_set.cache_relation('author')
>>> my_city.restaurant_set.cache(smart=True)
>>> Article.objects.filter(created__gte=yesterday).cache_set('todaysarticles')
>>> tag = Tag.objects.cache_relation('content_object').get(slug='news')
Background logic
The implementation class contains a registry of models that have been requested to cache (directly or via a relation).
To achieve as much transparency as possible, the QuerySet methods quietly
establish post_save and post_delete signal listeners the first time a
model is cached. Object deletion is handled trivially. On object creation or
modification, the preferred behavior is to create or update the cached key
rather than simply deleting the key and letting the cache regenerate it;
the rationale is that the object is most likely to be viewed immediately after
and caching it at post_save is cheap. However, this may not be desirable in
certain cases.
To reduce the number of cache misses, additional "smart" logic can be added. For example, the first time a model is registered to the cache signal listener, its model instances are expected to be uncached. In this case, rather than fetching only primary keys, the objects are retrieved as normal (and cached).
By storing the expiration time, this can also take effect whenever the
cached objects have likely timed out. All "smart" functionality is enabled
using the smart keyword argument.
Notes
Code layout
- All caching code lives in a separate app at first. A custom
QuerySetclass derives from the official class, overriding where appropriate. AManagerclass with an overridenget_query_set()is used for testing, and additional middleware, etc. are located in the same folder. Perhaps eventually, the new code can be merged to trunk as Django proper. Hopefully the code will not be too invasive, but quite a fewQuerySetmethods will have to be hijacked.QuerySetrefactoring would be an ideal merge time.
- If the transaction middleware is enabled, it is desirable to have the cache
only update when the transaction succeeds. This is simple in implementation
but will couple the transaction middleware to the cache if not designed
properly. An additional middleware class can be created to handle this
case; however, it will have to stipulate placement immediately after the
TransactionMiddlewarein settings.py, and might be confused with the existingCacheMiddleware.
- I've been thinking quite a lot about the multitude of combinations of methods I've got here... I'm going to implement the simplest things I had in the original proposal first and branch out from there. I'll likely post some sort of map of the combinations later once I get it down on paper.
Interface changes
- I'm considering just making "smart" behaviour standard, or at least default.
- Perhaps the default cache key prefix should be specifiable in settings?
- Should
cache_related()lose thedepthargument and merely steal it fromselect_related()instead, if given?
- When
cache()is used withvalues(), perhaps another option could be added to allow retrieval of only the specified fields--however, this would break any regular cached lookup for that object.
Timeline
First Month
- Write preliminary tests. Initial implementation of
cache()for single objects. Support typicalQuerySetmethods.
- Devise a generic idiom for testing cache-related code.
- Later in the month, work on
cache_related(). Work on agregates; implementselect_related(),values(), andin_bulk()cases.
Second Month
- Work on signal dispatching, cache coherency. Write more tests and preliminary documentation.
- Write "smart" cache logic. Explore other possible optimizations.
- Add transaction support. Design decision needed about extra middleware.
- Implement extra features (
distinct(),extra(select=...), ...) in conjunction withcache_set().
Last Month
- Write up documentation, extensive tests, and example code.
- Edge cases, corner cases... there are going to be quite a few!
- Refactor, especially if the new
QuerySethas been released. Continue merging with changes to trunk and testing.
- Allow for wiggle room,
QuerySetrefactoring work, cleanup, etc.
class Meta:
I'm definitely wide open for comments and criticisms! You can contact me at paul@….