''This is the original Django GSoC proposal. There have been quite a few

revisions since, but I'm posting this first for reference.''



= Abstract =



This addition to Django's ORM adds **simple drop-in caching**, compatible with

nearly all existing `QuerySet` methods. It emphasizes

performance and compatibility, and providing configuration options with sane

defaults. All that is required for basic functionality is a suitable

`CACHE_BACKEND` setting and the addition of `.cache()` to the appropriate

`QuerySet` chains. It also speeds up the lookup of related objects, and even

that of [http://www.djangoproject.com/documentation/models/generic_relations generic relations].





= Proposed Design =



The `QuerySet` class grows two new methods to add object caching:



{{{

    cache(timeout=None, prefix='qscache:', smart=False)

}}}

    `timeout` defaults to the amount specified in `CACHE_BACKEND`.

    `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`.



    Cache keys are calculated with the content-type id and instance id, to

    accomodate generic relations.



    Internally, `QuerySet` grows some new attributes that affect how SQL is

    generated. When in effect, they cause the query to only retrieve primary

    keys of selected objects. `in_bulk()` uses the cache directly, although

    cache misses will still require database hits, as usual.  Methods such as

    `delete()` and `count()` are largely unaffected by `cache()`, but

    methods such as `distinct()` are a more difficult case and will require

    some design decisions. Using `extra(select=...)` is also a possibly

    unsolvable case.



    If `values()` has been used in the query, `cache()` takes precedence

    and creates the values dictionary from cache. If a list of fields is

    specified in `values()`, `cache()` will still perform the equivalent of a

    `SELECT *`. Perhaps another option could be added to allow retrieval

    of only the specified fields, which would break any regular cached lookup

    for that object.



    `select_related()` is supported by the caching mechanism. The appropriate

    joins are still performed by the database; if joins were calculated with

    cached object foreign key values, cache misses could be very costly.



{{{

    cache_generic(field, timeout=None, prefix='qscache:', smart=False)

}}}



    `field` is the name of the generic foreign key field.



    Without database-specific trickery it is non-trivial to perform SQL JOINs

    with generic relations. Currently, a database query is required for each

    generic foreign key relationship. The cache framework, while unable to

    reduce the initial number of database hits, greatly alleviates load when

    lists of generic objects are required. Using this method still loads

    generic foreign keys lazily, but more quickly, and also uses objects cached

    with `cache()`.



To achieve as much transparency as possible, the `QuerySet` methods quietly

establish `post_save` and `post_delete` signal listeners the first time a

model is cached. Object deletion is trivial. On object creation or

modification, the preferred behaviour is to create or update the cached key

rather than simply deleting the key and letting the cache regenerate it;

the rationale is that the object is most likely to be viewed immediately after

and caching it at `post_save` is cheap. However, specific cases may not be

as accomodating. This is likely subject to debate or may need a global setting.



To reduce the number of cache misses, additional "smart" logic can be added.

For example, the first time a model is registered to the cache signal listener,

its model instances are expected to be uncached. In this case, rather than

fetching only primary keys, the objects are retrieved as normal (and cached).

By storing the expiration time, this can also take effect whenever the

cached objects have likely timed out. All "smart" functionality is enabled

using the `smart` keyword argument.





= Implementation Notes =



* All caching code lives in a contrib app at first. A custom `QuerySet` class

  derives from the official class, overriding where appropriate. A `Manager`

  class with an overriden `get_query_set()` is used for testing, and

  additional middleware, etc. are located in the same folder. Near or upon

  completion, the new code can be merged to trunk as Django proper. Hopefully

  the code will not be too invasive, but quite a few `QuerySet` methods will

  have to be hijacked.



* If the transaction middleware is enabled, it is desirable to have the cache

  only update when the transaction succeeds. This is simple in implementation

  but will couple the transaction middleware to the cache if not designed

  properly. An additional middleware class can be created to handle this

  case; however, it will have to stipulate placement immediately after the

  `TransactionMiddleware` in settings.py, and might be confused with the

  existing `CacheMiddleware`.





= Timeline =



== First Month ==



* Write preliminary tests. Initial implementation of `cache()` for single

  objects. Support almost all typical `QuerySet` methods.



* Devise a generic idiom for testing cache-related code. Work on agregates;

  implement `select_related()`, `values()`, `in_bulk()` cases, and

  `cache_generic()` method.



== Second Month ==



* Work on signal dispatching, cache coherency. Write more tests and preliminary

  documentation.



* Write "smart" cache logic. Explore other possible optimizations.



* Add transaction support. Design decision needed about extra middleware.



* Implement extra features if possible (`distinct()`, `extra(select=...)`, ...)



== Last Month ==



* Write up documentation, extensive tests, and example code. Possibly move from

  contrib into the main cache module.



* Refactor, especially if the new `QuerySet` has been released. Continue

  merging with changes to trunk and testing.



* Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc.