Changes between Version 2 and Version 3 of ObjectLevelCaching

05/28/07 10:19:03 (7 years ago)
Paul Collier <paul@…>

Some thoughts...


  • ObjectLevelCaching

    v2 v3  
    1 ''This is the original Django GSoC proposal. There have been quite a few 
    2 revisions since, but I'm posting this first for reference.'' 
    41= Abstract = 
    6 This addition to Django's ORM adds **simple drop-in caching**, compatible with 
     3This addition to Django's ORM adds simple drop-in caching, compatible with 
    74nearly all existing `QuerySet` methods. It emphasizes 
    85performance and compatibility, and providing configuration options with sane 
    1512= Proposed Design = 
    17 The `QuerySet` class grows two new methods to add object caching: 
     14The `QuerySet` class grows new methods to add object caching: 
    19 == cache() == 
     16== .cache() == 
    2118cache(timeout=None, prefix='qscache:', smart=False) 
     21    This method causes models instances found in the returned 
     22    `QuerySet` to be cached individually; the cache key is 
     23    calculated using the contrib.contenttypes model id and the 
     24    instance's pk value. (This is all done lazily and the position 
     25    of `cache()` does not matter, to be consistent with other methods.) 
    2427    `timeout` defaults to the amount specified in `CACHE_BACKEND`. 
    2528    `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`. 
    27     Cache keys are calculated with the content-type id and instance id, to 
    28     accomodate generic relations. 
    3030    Internally, `QuerySet` grows some new attributes that affect how SQL is 
    31     generated. When in effect, they cause the query to only retrieve primary 
     31    generated. Use of `cache()` causes the query to retrieve only primary 
    3232    keys of selected objects. `in_bulk()` uses the cache directly, although 
    3333    cache misses will still require database hits, as usual.  Methods such as 
    4040    and creates the values dictionary from cache. If a list of fields is 
    4141    specified in `values()`, `cache()` will still perform the equivalent of a 
    42     `SELECT *`. Perhaps another option could be added to allow retrieval 
    43     of only the specified fields, which would break any regular cached lookup 
    44     for that object. 
     42    `SELECT *`.  
    4644    `select_related()` is supported by the caching mechanism. The appropriate 
    4745    joins are still performed by the database; if joins were calculated with 
    48     cached object foreign key values, cache misses could be very costly. 
     46    cached foreign key values, cache misses could become very costly. 
    50 == cache_generic() == 
     48== .cache_related() == 
    52 cache_generic(field, timeout=None, prefix='qscache:', smart=False) 
     50cache_related(fields, timeout=None, prefix='qscache:', smart=False) 
     52    `fields` is a name or list of foreign keys, many-to-many/one-to-one fields, 
     53    reverse-lookup fields, or generic foreign keys on the model. Model instances 
     54    pointed to by the given relation will be cached similarly to `cache()`. 
    55     `field` is the name of the generic foreign key field. 
     56    I'm not sold on the signature of this method... *args would be nice 
     57    but then the other defaulted arguments would be replaced by **kwargs. 
     59    Also, the special string `'*'` could be accepted to cache all relations. 
     60    Either that or another method `cache_all_relations()`. 
     62=== Aside === 
    5763    Without database-specific trickery it is non-trivial to perform SQL JOINs 
    5864    with generic relations. Currently, a database query is required for each 
    6369    with `cache()`. 
     71== .cache_set() == 
     73cache_set(cache_key, timeout=None, smart=False, depth=1) 
     75    Similar to taking the resulting QuerySet and storing it directly in the 
     76    cache. Overrides `cache()`, but does not cache relations.  
     78    If `select_related()` is used in the same `QuerySet`, `cache_set()` will 
     79    also cache the  
     80    If `cache_related()` is used in the same `QuerySet`, it overrides use of 
     81    `select_related()`. 
     83== Sample usage == 
     86>>> article.comment_set.cache_relation('author') 
     87>>> my_city.restaurant_set.cache(smart=True) 
     88>>> Article.objects.filter(created__gte=yesterday).cache_set('todaysarticles') 
     89>>> tag = Tag.objects.cache_relation('content_object').get(slug='news') 
    6592== Background logic == 
     94The implementation class contains a registry of models that have been requested 
     95to cache (directly or via a relation). 
    6797To achieve as much transparency as possible, the `QuerySet` methods quietly 
    6898establish `post_save` and `post_delete` signal listeners the first time a 
    69 model is cached. Object deletion is trivial. On object creation or 
     99model is cached. Object deletion is handled trivially. On object creation or 
    70100modification, the preferred behavior is to create or update the cached key 
    71101rather than simply deleting the key and letting the cache regenerate it; 
    72102the rationale is that the object is most likely to be viewed immediately after 
    73 and caching it at `post_save` is cheap. However, specific cases may not be 
    74 as accommodating. This is likely subject to debate or may need a global setting. 
     103and caching it at `post_save` is cheap. However, this may not be desirable in 
     104certain cases. 
    76106To reduce the number of cache misses, additional "smart" logic can be added. 
    86 = Implementation Notes = 
     116= Notes = 
    88  * All caching code lives in a contrib app at first. A custom `QuerySet` class 
     118== Code layout == 
     120 * All caching code lives in a separate app at first. A custom `QuerySet` class 
    89121   derives from the official class, overriding where appropriate. A `Manager` 
    90122   class with an overriden `get_query_set()` is used for testing, and 
    91    additional middleware, etc. are located in the same folder. Near or upon 
    92    completion, the new code can be merged to trunk as Django proper. Hopefully 
     123   additional middleware, etc. are located in the same folder. Perhaps 
     124   eventually, the new code can be merged to trunk as Django proper. Hopefully 
    93125   the code will not be too invasive, but quite a few `QuerySet` methods will 
    94    have to be hijacked. 
     126   have to be hijacked. `QuerySet` refactoring would be an ideal merge time. 
    96128 * If the transaction middleware is enabled, it is desirable to have the cache 
    102134   existing `CacheMiddleware`. 
     136 * I've been thinking quite a lot about the multitude of combinations of 
     137   methods I've got here... I'm going to implement the simplest things I 
     138   had in the original proposal first and branch out from there. I'll 
     139   likely post some sort of map of the combinations later once I get it 
     140   down on paper. 
     142== Interface changes == 
     144 * I'm considering just making "smart" behaviour standard, or at least default. 
     146 * Perhaps the default cache key prefix should be specifiable in settings? 
     148 * Should `cache_related()` lose the `depth` argument and merely steal it  
     149   from `select_related()` instead, if given? 
     151 * When `cache()` is used with `values()`, perhaps another option could be 
     152   added to allow retrieval of only the specified fields--however, this would 
     153   break any regular cached lookup for that object. 
    104155= Timeline = 
    108159 * Write preliminary tests. Initial implementation of `cache()` for single 
    109    objects. Support almost all typical `QuerySet` methods. 
     160   objects. Support typical `QuerySet` methods. 
    111  * Devise a generic idiom for testing cache-related code. Work on agregates; 
    112    implement `select_related()`, `values()`, `in_bulk()` cases, and 
    113    `cache_generic()` method. 
     162 * Devise a generic idiom for testing cache-related code.  
     164 * Later in the month, work on `cache_related()`. Work on agregates; 
     165   implement `select_related()`, `values()`, and `in_bulk()` cases. 
    115167== Second Month == 
    122174 * Add transaction support. Design decision needed about extra middleware. 
    124  * Implement extra features if possible (`distinct()`, `extra(select=...)`, ...) 
     176 * Implement extra features (`distinct()`, `extra(select=...)`, ...) 
     177   in conjunction with `cache_set()`. 
    126179== Last Month == 
    128  * Write up documentation, extensive tests, and example code. Possibly move from 
    129    contrib into the main cache module. 
     181 * Write up documentation, extensive tests, and example code. 
     183 * Edge cases, corner cases... there are going to be quite a few! 
    131185 * Refactor, especially if the new `QuerySet` has been released. Continue 
    134188 * Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc. 
     190= class Meta: = 
     192I'm definitely wide open for comments and criticisms! You can contact me at