Changes between Initial Version and Version 1 of ObjectLevelCaching


Ignore:
Timestamp:
May 27, 2007, 12:39:25 PM (17 years ago)
Author:
Paul Collier
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ObjectLevelCaching

    v1 v1  
     1''This is the original Django GSoC proposal. There have been quite a few
     2
     3revisions since, but I'm posting this first for reference.''
     4
     5
     6
     7= Abstract =
     8
     9
     10
     11This addition to Django's ORM adds **simple drop-in caching**, compatible with
     12
     13nearly all existing `QuerySet` methods. It emphasizes
     14
     15performance and compatibility, and providing configuration options with sane
     16
     17defaults. All that is required for basic functionality is a suitable
     18
     19`CACHE_BACKEND` setting and the addition of `.cache()` to the appropriate
     20
     21`QuerySet` chains. It also speeds up the lookup of related objects, and even
     22
     23that of [http://www.djangoproject.com/documentation/models/generic_relations generic relations].
     24
     25
     26
     27
     28
     29= Proposed Design =
     30
     31
     32
     33The `QuerySet` class grows two new methods to add object caching:
     34
     35
     36
     37{{{
     38
     39    cache(timeout=None, prefix='qscache:', smart=False)
     40
     41}}}
     42
     43    `timeout` defaults to the amount specified in `CACHE_BACKEND`.
     44
     45    `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`.
     46
     47
     48
     49    Cache keys are calculated with the content-type id and instance id, to
     50
     51    accomodate generic relations.
     52
     53
     54
     55    Internally, `QuerySet` grows some new attributes that affect how SQL is
     56
     57    generated. When in effect, they cause the query to only retrieve primary
     58
     59    keys of selected objects. `in_bulk()` uses the cache directly, although
     60
     61    cache misses will still require database hits, as usual.  Methods such as
     62
     63    `delete()` and `count()` are largely unaffected by `cache()`, but
     64
     65    methods such as `distinct()` are a more difficult case and will require
     66
     67    some design decisions. Using `extra(select=...)` is also a possibly
     68
     69    unsolvable case.
     70
     71
     72
     73    If `values()` has been used in the query, `cache()` takes precedence
     74
     75    and creates the values dictionary from cache. If a list of fields is
     76
     77    specified in `values()`, `cache()` will still perform the equivalent of a
     78
     79    `SELECT *`. Perhaps another option could be added to allow retrieval
     80
     81    of only the specified fields, which would break any regular cached lookup
     82
     83    for that object.
     84
     85
     86
     87    `select_related()` is supported by the caching mechanism. The appropriate
     88
     89    joins are still performed by the database; if joins were calculated with
     90
     91    cached object foreign key values, cache misses could be very costly.
     92
     93
     94
     95{{{
     96
     97    cache_generic(field, timeout=None, prefix='qscache:', smart=False)
     98
     99}}}
     100
     101
     102
     103    `field` is the name of the generic foreign key field.
     104
     105
     106
     107    Without database-specific trickery it is non-trivial to perform SQL JOINs
     108
     109    with generic relations. Currently, a database query is required for each
     110
     111    generic foreign key relationship. The cache framework, while unable to
     112
     113    reduce the initial number of database hits, greatly alleviates load when
     114
     115    lists of generic objects are required. Using this method still loads
     116
     117    generic foreign keys lazily, but more quickly, and also uses objects cached
     118
     119    with `cache()`.
     120
     121
     122
     123To achieve as much transparency as possible, the `QuerySet` methods quietly
     124
     125establish `post_save` and `post_delete` signal listeners the first time a
     126
     127model is cached. Object deletion is trivial. On object creation or
     128
     129modification, the preferred behaviour is to create or update the cached key
     130
     131rather than simply deleting the key and letting the cache regenerate it;
     132
     133the rationale is that the object is most likely to be viewed immediately after
     134
     135and caching it at `post_save` is cheap. However, specific cases may not be
     136
     137as accomodating. This is likely subject to debate or may need a global setting.
     138
     139
     140
     141To reduce the number of cache misses, additional "smart" logic can be added.
     142
     143For example, the first time a model is registered to the cache signal listener,
     144
     145its model instances are expected to be uncached. In this case, rather than
     146
     147fetching only primary keys, the objects are retrieved as normal (and cached).
     148
     149By storing the expiration time, this can also take effect whenever the
     150
     151cached objects have likely timed out. All "smart" functionality is enabled
     152
     153using the `smart` keyword argument.
     154
     155
     156
     157
     158
     159= Implementation Notes =
     160
     161
     162
     163* All caching code lives in a contrib app at first. A custom `QuerySet` class
     164
     165  derives from the official class, overriding where appropriate. A `Manager`
     166
     167  class with an overriden `get_query_set()` is used for testing, and
     168
     169  additional middleware, etc. are located in the same folder. Near or upon
     170
     171  completion, the new code can be merged to trunk as Django proper. Hopefully
     172
     173  the code will not be too invasive, but quite a few `QuerySet` methods will
     174
     175  have to be hijacked.
     176
     177
     178
     179* If the transaction middleware is enabled, it is desirable to have the cache
     180
     181  only update when the transaction succeeds. This is simple in implementation
     182
     183  but will couple the transaction middleware to the cache if not designed
     184
     185  properly. An additional middleware class can be created to handle this
     186
     187  case; however, it will have to stipulate placement immediately after the
     188
     189  `TransactionMiddleware` in settings.py, and might be confused with the
     190
     191  existing `CacheMiddleware`.
     192
     193
     194
     195
     196
     197= Timeline =
     198
     199
     200
     201== First Month ==
     202
     203
     204
     205* Write preliminary tests. Initial implementation of `cache()` for single
     206
     207  objects. Support almost all typical `QuerySet` methods.
     208
     209
     210
     211* Devise a generic idiom for testing cache-related code. Work on agregates;
     212
     213  implement `select_related()`, `values()`, `in_bulk()` cases, and
     214
     215  `cache_generic()` method.
     216
     217
     218
     219== Second Month ==
     220
     221
     222
     223* Work on signal dispatching, cache coherency. Write more tests and preliminary
     224
     225  documentation.
     226
     227
     228
     229* Write "smart" cache logic. Explore other possible optimizations.
     230
     231
     232
     233* Add transaction support. Design decision needed about extra middleware.
     234
     235
     236
     237* Implement extra features if possible (`distinct()`, `extra(select=...)`, ...)
     238
     239
     240
     241== Last Month ==
     242
     243
     244
     245* Write up documentation, extensive tests, and example code. Possibly move from
     246
     247  contrib into the main cache module.
     248
     249
     250
     251* Refactor, especially if the new `QuerySet` has been released. Continue
     252
     253  merging with changes to trunk and testing.
     254
     255
     256
     257* Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc.
     258
     259
     260
Back to Top