Changes between Version 2 and Version 3 of ObjectLevelCaching


Ignore:
Timestamp:
May 28, 2007, 12:19:03 PM (17 years ago)
Author:
Paul Collier <paul@…>
Comment:

Some thoughts...

Legend:

Unmodified
Added
Removed
Modified
  • ObjectLevelCaching

    v2 v3  
    1 ''This is the original Django GSoC proposal. There have been quite a few
    2 revisions since, but I'm posting this first for reference.''
    3 
    41= Abstract =
    52
    6 This addition to Django's ORM adds **simple drop-in caching**, compatible with
     3This addition to Django's ORM adds simple drop-in caching, compatible with
    74nearly all existing `QuerySet` methods. It emphasizes
    85performance and compatibility, and providing configuration options with sane
     
    1512= Proposed Design =
    1613
    17 The `QuerySet` class grows two new methods to add object caching:
     14The `QuerySet` class grows new methods to add object caching:
    1815
    19 == cache() ==
     16== .cache() ==
    2017{{{
    2118cache(timeout=None, prefix='qscache:', smart=False)
    2219}}}
    2320
     21    This method causes models instances found in the returned
     22    `QuerySet` to be cached individually; the cache key is
     23    calculated using the contrib.contenttypes model id and the
     24    instance's pk value. (This is all done lazily and the position
     25    of `cache()` does not matter, to be consistent with other methods.)
     26
    2427    `timeout` defaults to the amount specified in `CACHE_BACKEND`.
    2528    `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`.
    2629
    27     Cache keys are calculated with the content-type id and instance id, to
    28     accomodate generic relations.
    29 
    3030    Internally, `QuerySet` grows some new attributes that affect how SQL is
    31     generated. When in effect, they cause the query to only retrieve primary
     31    generated. Use of `cache()` causes the query to retrieve only primary
    3232    keys of selected objects. `in_bulk()` uses the cache directly, although
    3333    cache misses will still require database hits, as usual.  Methods such as
     
    4040    and creates the values dictionary from cache. If a list of fields is
    4141    specified in `values()`, `cache()` will still perform the equivalent of a
    42     `SELECT *`. Perhaps another option could be added to allow retrieval
    43     of only the specified fields, which would break any regular cached lookup
    44     for that object.
     42    `SELECT *`.
    4543
    4644    `select_related()` is supported by the caching mechanism. The appropriate
    4745    joins are still performed by the database; if joins were calculated with
    48     cached object foreign key values, cache misses could be very costly.
     46    cached foreign key values, cache misses could become very costly.
    4947
    50 == cache_generic() ==
     48== .cache_related() ==
    5149{{{
    52 cache_generic(field, timeout=None, prefix='qscache:', smart=False)
     50cache_related(fields, timeout=None, prefix='qscache:', smart=False)
    5351}}}
     52    `fields` is a name or list of foreign keys, many-to-many/one-to-one fields,
     53    reverse-lookup fields, or generic foreign keys on the model. Model instances
     54    pointed to by the given relation will be cached similarly to `cache()`.
    5455
    55     `field` is the name of the generic foreign key field.
     56    I'm not sold on the signature of this method... *args would be nice
     57    but then the other defaulted arguments would be replaced by **kwargs.
    5658
     59    Also, the special string `'*'` could be accepted to cache all relations.
     60    Either that or another method `cache_all_relations()`.
     61
     62=== Aside ===
    5763    Without database-specific trickery it is non-trivial to perform SQL JOINs
    5864    with generic relations. Currently, a database query is required for each
     
    6369    with `cache()`.
    6470
     71== .cache_set() ==
     72{{{
     73cache_set(cache_key, timeout=None, smart=False, depth=1)
     74}}}
     75    Similar to taking the resulting QuerySet and storing it directly in the
     76    cache. Overrides `cache()`, but does not cache relations.
     77
     78    If `select_related()` is used in the same `QuerySet`, `cache_set()` will
     79    also cache the
     80    If `cache_related()` is used in the same `QuerySet`, it overrides use of
     81    `select_related()`.
     82
     83== Sample usage ==
     84
     85{{{
     86>>> article.comment_set.cache_relation('author')
     87>>> my_city.restaurant_set.cache(smart=True)
     88>>> Article.objects.filter(created__gte=yesterday).cache_set('todaysarticles')
     89>>> tag = Tag.objects.cache_relation('content_object').get(slug='news')
     90}}}
     91
    6592== Background logic ==
     93
     94The implementation class contains a registry of models that have been requested
     95to cache (directly or via a relation).
    6696
    6797To achieve as much transparency as possible, the `QuerySet` methods quietly
    6898establish `post_save` and `post_delete` signal listeners the first time a
    69 model is cached. Object deletion is trivial. On object creation or
     99model is cached. Object deletion is handled trivially. On object creation or
    70100modification, the preferred behavior is to create or update the cached key
    71101rather than simply deleting the key and letting the cache regenerate it;
    72102the rationale is that the object is most likely to be viewed immediately after
    73 and caching it at `post_save` is cheap. However, specific cases may not be
    74 as accommodating. This is likely subject to debate or may need a global setting.
     103and caching it at `post_save` is cheap. However, this may not be desirable in
     104certain cases.
    75105
    76106To reduce the number of cache misses, additional "smart" logic can be added.
     
    84114
    85115
    86 = Implementation Notes =
     116= Notes =
    87117
    88  * All caching code lives in a contrib app at first. A custom `QuerySet` class
     118== Code layout ==
     119
     120 * All caching code lives in a separate app at first. A custom `QuerySet` class
    89121   derives from the official class, overriding where appropriate. A `Manager`
    90122   class with an overriden `get_query_set()` is used for testing, and
    91    additional middleware, etc. are located in the same folder. Near or upon
    92    completion, the new code can be merged to trunk as Django proper. Hopefully
     123   additional middleware, etc. are located in the same folder. Perhaps
     124   eventually, the new code can be merged to trunk as Django proper. Hopefully
    93125   the code will not be too invasive, but quite a few `QuerySet` methods will
    94    have to be hijacked.
     126   have to be hijacked. `QuerySet` refactoring would be an ideal merge time.
    95127
    96128 * If the transaction middleware is enabled, it is desirable to have the cache
     
    102134   existing `CacheMiddleware`.
    103135
     136 * I've been thinking quite a lot about the multitude of combinations of
     137   methods I've got here... I'm going to implement the simplest things I
     138   had in the original proposal first and branch out from there. I'll
     139   likely post some sort of map of the combinations later once I get it
     140   down on paper.
     141
     142== Interface changes ==
     143
     144 * I'm considering just making "smart" behaviour standard, or at least default.
     145
     146 * Perhaps the default cache key prefix should be specifiable in settings?
     147
     148 * Should `cache_related()` lose the `depth` argument and merely steal it
     149   from `select_related()` instead, if given?
     150
     151 * When `cache()` is used with `values()`, perhaps another option could be
     152   added to allow retrieval of only the specified fields--however, this would
     153   break any regular cached lookup for that object.
     154
    104155= Timeline =
    105156
     
    107158
    108159 * Write preliminary tests. Initial implementation of `cache()` for single
    109    objects. Support almost all typical `QuerySet` methods.
     160   objects. Support typical `QuerySet` methods.
    110161
    111  * Devise a generic idiom for testing cache-related code. Work on agregates;
    112    implement `select_related()`, `values()`, `in_bulk()` cases, and
    113    `cache_generic()` method.
     162 * Devise a generic idiom for testing cache-related code.
     163
     164 * Later in the month, work on `cache_related()`. Work on agregates;
     165   implement `select_related()`, `values()`, and `in_bulk()` cases.
    114166
    115167== Second Month ==
     
    122174 * Add transaction support. Design decision needed about extra middleware.
    123175
    124  * Implement extra features if possible (`distinct()`, `extra(select=...)`, ...)
     176 * Implement extra features (`distinct()`, `extra(select=...)`, ...)
     177   in conjunction with `cache_set()`.
    125178
    126179== Last Month ==
    127180
    128  * Write up documentation, extensive tests, and example code. Possibly move from
    129    contrib into the main cache module.
     181 * Write up documentation, extensive tests, and example code.
     182
     183 * Edge cases, corner cases... there are going to be quite a few!
    130184
    131185 * Refactor, especially if the new `QuerySet` has been released. Continue
     
    133187
    134188 * Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc.
     189
     190= class Meta: =
     191
     192I'm definitely wide open for comments and criticisms! You can contact me at
     193[mailto:paul@paul-collier.com paul@paulcollier.com].
Back to Top