Version 1 (modified by 18 years ago) ( diff ) | ,
---|
This is the original Django GSoC proposal. There have been quite a few
revisions since, but I'm posting this first for reference.
Abstract
This addition to Django's ORM adds simple drop-in caching, compatible with
nearly all existing QuerySet
methods. It emphasizes
performance and compatibility, and providing configuration options with sane
defaults. All that is required for basic functionality is a suitable
CACHE_BACKEND
setting and the addition of .cache()
to the appropriate
QuerySet
chains. It also speeds up the lookup of related objects, and even
that of generic relations.
Proposed Design
The QuerySet
class grows two new methods to add object caching:
cache(timeout=None, prefix='qscache:', smart=False)
timeout
defaults to the amount specified inCACHE_BACKEND
.
prefix
is in addition toCACHE_MIDDLEWARE_KEY_PREFIX
.
Cache keys are calculated with the content-type id and instance id, to
accomodate generic relations.
Internally,
QuerySet
grows some new attributes that affect how SQL is
generated. When in effect, they cause the query to only retrieve primary
keys of selected objects.
in_bulk()
uses the cache directly, although
cache misses will still require database hits, as usual. Methods such as
delete()
andcount()
are largely unaffected bycache()
, but
methods such as
distinct()
are a more difficult case and will require
some design decisions. Using
extra(select=...)
is also a possibly
unsolvable case.
If
values()
has been used in the query,cache()
takes precedence
and creates the values dictionary from cache. If a list of fields is
specified in
values()
,cache()
will still perform the equivalent of a
SELECT *
. Perhaps another option could be added to allow retrieval
of only the specified fields, which would break any regular cached lookup
for that object.
select_related()
is supported by the caching mechanism. The appropriate
joins are still performed by the database; if joins were calculated with
cached object foreign key values, cache misses could be very costly.
cache_generic(field, timeout=None, prefix='qscache:', smart=False)
field
is the name of the generic foreign key field.
Without database-specific trickery it is non-trivial to perform SQL JOINs
with generic relations. Currently, a database query is required for each
generic foreign key relationship. The cache framework, while unable to
reduce the initial number of database hits, greatly alleviates load when
lists of generic objects are required. Using this method still loads
generic foreign keys lazily, but more quickly, and also uses objects cached
with
cache()
.
To achieve as much transparency as possible, the QuerySet
methods quietly
establish post_save
and post_delete
signal listeners the first time a
model is cached. Object deletion is trivial. On object creation or
modification, the preferred behaviour is to create or update the cached key
rather than simply deleting the key and letting the cache regenerate it;
the rationale is that the object is most likely to be viewed immediately after
and caching it at post_save
is cheap. However, specific cases may not be
as accomodating. This is likely subject to debate or may need a global setting.
To reduce the number of cache misses, additional "smart" logic can be added.
For example, the first time a model is registered to the cache signal listener,
its model instances are expected to be uncached. In this case, rather than
fetching only primary keys, the objects are retrieved as normal (and cached).
By storing the expiration time, this can also take effect whenever the
cached objects have likely timed out. All "smart" functionality is enabled
using the smart
keyword argument.
Implementation Notes
- All caching code lives in a contrib app at first. A custom
QuerySet
class
derives from the official class, overriding where appropriate. A
Manager
class with an overriden
get_query_set()
is used for testing, and
additional middleware, etc. are located in the same folder. Near or upon
completion, the new code can be merged to trunk as Django proper. Hopefully
the code will not be too invasive, but quite a few
QuerySet
methods will
have to be hijacked.
- If the transaction middleware is enabled, it is desirable to have the cache
only update when the transaction succeeds. This is simple in implementation
but will couple the transaction middleware to the cache if not designed
properly. An additional middleware class can be created to handle this
case; however, it will have to stipulate placement immediately after the
TransactionMiddleware
in settings.py, and might be confused with the
existing
CacheMiddleware
.
Timeline
First Month
- Write preliminary tests. Initial implementation of
cache()
for single
objects. Support almost all typical
QuerySet
methods.
- Devise a generic idiom for testing cache-related code. Work on agregates;
implement
select_related()
,values()
,in_bulk()
cases, and
cache_generic()
method.
Second Month
- Work on signal dispatching, cache coherency. Write more tests and preliminary
documentation.
- Write "smart" cache logic. Explore other possible optimizations.
- Add transaction support. Design decision needed about extra middleware.
- Implement extra features if possible (
distinct()
,extra(select=...)
, ...)
Last Month
- Write up documentation, extensive tests, and example code. Possibly move from
contrib into the main cache module.
- Refactor, especially if the new
QuerySet
has been released. Continue
merging with changes to trunk and testing.
- Allow for wiggle room,
QuerySet
refactoring work, cleanup, etc.