Code

Changes between Version 2 and Version 3 of ObjectLevelCaching


Ignore:
Timestamp:
05/28/07 10:19:03 (7 years ago)
Author:
Paul Collier <paul@…>
Comment:

Some thoughts...

Legend:

Unmodified
Added
Removed
Modified
  • ObjectLevelCaching

    v2 v3  
    1 ''This is the original Django GSoC proposal. There have been quite a few 
    2 revisions since, but I'm posting this first for reference.'' 
    3  
    41= Abstract = 
    52 
    6 This addition to Django's ORM adds **simple drop-in caching**, compatible with 
     3This addition to Django's ORM adds simple drop-in caching, compatible with 
    74nearly all existing `QuerySet` methods. It emphasizes 
    85performance and compatibility, and providing configuration options with sane 
     
    1512= Proposed Design = 
    1613 
    17 The `QuerySet` class grows two new methods to add object caching: 
     14The `QuerySet` class grows new methods to add object caching: 
    1815 
    19 == cache() == 
     16== .cache() == 
    2017{{{ 
    2118cache(timeout=None, prefix='qscache:', smart=False) 
    2219}}} 
    2320 
     21    This method causes models instances found in the returned 
     22    `QuerySet` to be cached individually; the cache key is 
     23    calculated using the contrib.contenttypes model id and the 
     24    instance's pk value. (This is all done lazily and the position 
     25    of `cache()` does not matter, to be consistent with other methods.) 
     26 
    2427    `timeout` defaults to the amount specified in `CACHE_BACKEND`. 
    2528    `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`. 
    2629 
    27     Cache keys are calculated with the content-type id and instance id, to 
    28     accomodate generic relations. 
    29  
    3030    Internally, `QuerySet` grows some new attributes that affect how SQL is 
    31     generated. When in effect, they cause the query to only retrieve primary 
     31    generated. Use of `cache()` causes the query to retrieve only primary 
    3232    keys of selected objects. `in_bulk()` uses the cache directly, although 
    3333    cache misses will still require database hits, as usual.  Methods such as 
     
    4040    and creates the values dictionary from cache. If a list of fields is 
    4141    specified in `values()`, `cache()` will still perform the equivalent of a 
    42     `SELECT *`. Perhaps another option could be added to allow retrieval 
    43     of only the specified fields, which would break any regular cached lookup 
    44     for that object. 
     42    `SELECT *`.  
    4543 
    4644    `select_related()` is supported by the caching mechanism. The appropriate 
    4745    joins are still performed by the database; if joins were calculated with 
    48     cached object foreign key values, cache misses could be very costly. 
     46    cached foreign key values, cache misses could become very costly. 
    4947 
    50 == cache_generic() == 
     48== .cache_related() == 
    5149{{{ 
    52 cache_generic(field, timeout=None, prefix='qscache:', smart=False) 
     50cache_related(fields, timeout=None, prefix='qscache:', smart=False) 
    5351}}} 
     52    `fields` is a name or list of foreign keys, many-to-many/one-to-one fields, 
     53    reverse-lookup fields, or generic foreign keys on the model. Model instances 
     54    pointed to by the given relation will be cached similarly to `cache()`. 
    5455 
    55     `field` is the name of the generic foreign key field. 
     56    I'm not sold on the signature of this method... *args would be nice 
     57    but then the other defaulted arguments would be replaced by **kwargs. 
    5658 
     59    Also, the special string `'*'` could be accepted to cache all relations. 
     60    Either that or another method `cache_all_relations()`. 
     61 
     62=== Aside === 
    5763    Without database-specific trickery it is non-trivial to perform SQL JOINs 
    5864    with generic relations. Currently, a database query is required for each 
     
    6369    with `cache()`. 
    6470 
     71== .cache_set() == 
     72{{{ 
     73cache_set(cache_key, timeout=None, smart=False, depth=1) 
     74}}} 
     75    Similar to taking the resulting QuerySet and storing it directly in the 
     76    cache. Overrides `cache()`, but does not cache relations.  
     77 
     78    If `select_related()` is used in the same `QuerySet`, `cache_set()` will 
     79    also cache the  
     80    If `cache_related()` is used in the same `QuerySet`, it overrides use of 
     81    `select_related()`. 
     82 
     83== Sample usage == 
     84 
     85{{{ 
     86>>> article.comment_set.cache_relation('author') 
     87>>> my_city.restaurant_set.cache(smart=True) 
     88>>> Article.objects.filter(created__gte=yesterday).cache_set('todaysarticles') 
     89>>> tag = Tag.objects.cache_relation('content_object').get(slug='news') 
     90}}} 
     91 
    6592== Background logic == 
     93 
     94The implementation class contains a registry of models that have been requested 
     95to cache (directly or via a relation). 
    6696 
    6797To achieve as much transparency as possible, the `QuerySet` methods quietly 
    6898establish `post_save` and `post_delete` signal listeners the first time a 
    69 model is cached. Object deletion is trivial. On object creation or 
     99model is cached. Object deletion is handled trivially. On object creation or 
    70100modification, the preferred behavior is to create or update the cached key 
    71101rather than simply deleting the key and letting the cache regenerate it; 
    72102the rationale is that the object is most likely to be viewed immediately after 
    73 and caching it at `post_save` is cheap. However, specific cases may not be 
    74 as accommodating. This is likely subject to debate or may need a global setting. 
     103and caching it at `post_save` is cheap. However, this may not be desirable in 
     104certain cases. 
    75105 
    76106To reduce the number of cache misses, additional "smart" logic can be added. 
     
    84114 
    85115 
    86 = Implementation Notes = 
     116= Notes = 
    87117 
    88  * All caching code lives in a contrib app at first. A custom `QuerySet` class 
     118== Code layout == 
     119 
     120 * All caching code lives in a separate app at first. A custom `QuerySet` class 
    89121   derives from the official class, overriding where appropriate. A `Manager` 
    90122   class with an overriden `get_query_set()` is used for testing, and 
    91    additional middleware, etc. are located in the same folder. Near or upon 
    92    completion, the new code can be merged to trunk as Django proper. Hopefully 
     123   additional middleware, etc. are located in the same folder. Perhaps 
     124   eventually, the new code can be merged to trunk as Django proper. Hopefully 
    93125   the code will not be too invasive, but quite a few `QuerySet` methods will 
    94    have to be hijacked. 
     126   have to be hijacked. `QuerySet` refactoring would be an ideal merge time. 
    95127 
    96128 * If the transaction middleware is enabled, it is desirable to have the cache 
     
    102134   existing `CacheMiddleware`. 
    103135 
     136 * I've been thinking quite a lot about the multitude of combinations of 
     137   methods I've got here... I'm going to implement the simplest things I 
     138   had in the original proposal first and branch out from there. I'll 
     139   likely post some sort of map of the combinations later once I get it 
     140   down on paper. 
     141 
     142== Interface changes == 
     143 
     144 * I'm considering just making "smart" behaviour standard, or at least default. 
     145 
     146 * Perhaps the default cache key prefix should be specifiable in settings? 
     147 
     148 * Should `cache_related()` lose the `depth` argument and merely steal it  
     149   from `select_related()` instead, if given? 
     150 
     151 * When `cache()` is used with `values()`, perhaps another option could be 
     152   added to allow retrieval of only the specified fields--however, this would 
     153   break any regular cached lookup for that object. 
     154 
    104155= Timeline = 
    105156 
     
    107158 
    108159 * Write preliminary tests. Initial implementation of `cache()` for single 
    109    objects. Support almost all typical `QuerySet` methods. 
     160   objects. Support typical `QuerySet` methods. 
    110161 
    111  * Devise a generic idiom for testing cache-related code. Work on agregates; 
    112    implement `select_related()`, `values()`, `in_bulk()` cases, and 
    113    `cache_generic()` method. 
     162 * Devise a generic idiom for testing cache-related code.  
     163 
     164 * Later in the month, work on `cache_related()`. Work on agregates; 
     165   implement `select_related()`, `values()`, and `in_bulk()` cases. 
    114166 
    115167== Second Month == 
     
    122174 * Add transaction support. Design decision needed about extra middleware. 
    123175 
    124  * Implement extra features if possible (`distinct()`, `extra(select=...)`, ...) 
     176 * Implement extra features (`distinct()`, `extra(select=...)`, ...) 
     177   in conjunction with `cache_set()`. 
    125178 
    126179== Last Month == 
    127180 
    128  * Write up documentation, extensive tests, and example code. Possibly move from 
    129    contrib into the main cache module. 
     181 * Write up documentation, extensive tests, and example code. 
     182 
     183 * Edge cases, corner cases... there are going to be quite a few! 
    130184 
    131185 * Refactor, especially if the new `QuerySet` has been released. Continue 
     
    133187 
    134188 * Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc. 
     189 
     190= class Meta: = 
     191 
     192I'm definitely wide open for comments and criticisms! You can contact me at 
     193[mailto:paul@paul-collier.com paul@paulcollier.com].