| 1 | ''This is the original Django GSoC proposal. There have been quite a few |
| 2 | |
| 3 | revisions since, but I'm posting this first for reference.'' |
| 4 | |
| 5 | |
| 6 | |
| 7 | = Abstract = |
| 8 | |
| 9 | |
| 10 | |
| 11 | This addition to Django's ORM adds **simple drop-in caching**, compatible with |
| 12 | |
| 13 | nearly all existing `QuerySet` methods. It emphasizes |
| 14 | |
| 15 | performance and compatibility, and providing configuration options with sane |
| 16 | |
| 17 | defaults. All that is required for basic functionality is a suitable |
| 18 | |
| 19 | `CACHE_BACKEND` setting and the addition of `.cache()` to the appropriate |
| 20 | |
| 21 | `QuerySet` chains. It also speeds up the lookup of related objects, and even |
| 22 | |
| 23 | that of [http://www.djangoproject.com/documentation/models/generic_relations generic relations]. |
| 24 | |
| 25 | |
| 26 | |
| 27 | |
| 28 | |
| 29 | = Proposed Design = |
| 30 | |
| 31 | |
| 32 | |
| 33 | The `QuerySet` class grows two new methods to add object caching: |
| 34 | |
| 35 | |
| 36 | |
| 37 | {{{ |
| 38 | |
| 39 | cache(timeout=None, prefix='qscache:', smart=False) |
| 40 | |
| 41 | }}} |
| 42 | |
| 43 | `timeout` defaults to the amount specified in `CACHE_BACKEND`. |
| 44 | |
| 45 | `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`. |
| 46 | |
| 47 | |
| 48 | |
| 49 | Cache keys are calculated with the content-type id and instance id, to |
| 50 | |
| 51 | accomodate generic relations. |
| 52 | |
| 53 | |
| 54 | |
| 55 | Internally, `QuerySet` grows some new attributes that affect how SQL is |
| 56 | |
| 57 | generated. When in effect, they cause the query to only retrieve primary |
| 58 | |
| 59 | keys of selected objects. `in_bulk()` uses the cache directly, although |
| 60 | |
| 61 | cache misses will still require database hits, as usual. Methods such as |
| 62 | |
| 63 | `delete()` and `count()` are largely unaffected by `cache()`, but |
| 64 | |
| 65 | methods such as `distinct()` are a more difficult case and will require |
| 66 | |
| 67 | some design decisions. Using `extra(select=...)` is also a possibly |
| 68 | |
| 69 | unsolvable case. |
| 70 | |
| 71 | |
| 72 | |
| 73 | If `values()` has been used in the query, `cache()` takes precedence |
| 74 | |
| 75 | and creates the values dictionary from cache. If a list of fields is |
| 76 | |
| 77 | specified in `values()`, `cache()` will still perform the equivalent of a |
| 78 | |
| 79 | `SELECT *`. Perhaps another option could be added to allow retrieval |
| 80 | |
| 81 | of only the specified fields, which would break any regular cached lookup |
| 82 | |
| 83 | for that object. |
| 84 | |
| 85 | |
| 86 | |
| 87 | `select_related()` is supported by the caching mechanism. The appropriate |
| 88 | |
| 89 | joins are still performed by the database; if joins were calculated with |
| 90 | |
| 91 | cached object foreign key values, cache misses could be very costly. |
| 92 | |
| 93 | |
| 94 | |
| 95 | {{{ |
| 96 | |
| 97 | cache_generic(field, timeout=None, prefix='qscache:', smart=False) |
| 98 | |
| 99 | }}} |
| 100 | |
| 101 | |
| 102 | |
| 103 | `field` is the name of the generic foreign key field. |
| 104 | |
| 105 | |
| 106 | |
| 107 | Without database-specific trickery it is non-trivial to perform SQL JOINs |
| 108 | |
| 109 | with generic relations. Currently, a database query is required for each |
| 110 | |
| 111 | generic foreign key relationship. The cache framework, while unable to |
| 112 | |
| 113 | reduce the initial number of database hits, greatly alleviates load when |
| 114 | |
| 115 | lists of generic objects are required. Using this method still loads |
| 116 | |
| 117 | generic foreign keys lazily, but more quickly, and also uses objects cached |
| 118 | |
| 119 | with `cache()`. |
| 120 | |
| 121 | |
| 122 | |
| 123 | To achieve as much transparency as possible, the `QuerySet` methods quietly |
| 124 | |
| 125 | establish `post_save` and `post_delete` signal listeners the first time a |
| 126 | |
| 127 | model is cached. Object deletion is trivial. On object creation or |
| 128 | |
| 129 | modification, the preferred behaviour is to create or update the cached key |
| 130 | |
| 131 | rather than simply deleting the key and letting the cache regenerate it; |
| 132 | |
| 133 | the rationale is that the object is most likely to be viewed immediately after |
| 134 | |
| 135 | and caching it at `post_save` is cheap. However, specific cases may not be |
| 136 | |
| 137 | as accomodating. This is likely subject to debate or may need a global setting. |
| 138 | |
| 139 | |
| 140 | |
| 141 | To reduce the number of cache misses, additional "smart" logic can be added. |
| 142 | |
| 143 | For example, the first time a model is registered to the cache signal listener, |
| 144 | |
| 145 | its model instances are expected to be uncached. In this case, rather than |
| 146 | |
| 147 | fetching only primary keys, the objects are retrieved as normal (and cached). |
| 148 | |
| 149 | By storing the expiration time, this can also take effect whenever the |
| 150 | |
| 151 | cached objects have likely timed out. All "smart" functionality is enabled |
| 152 | |
| 153 | using the `smart` keyword argument. |
| 154 | |
| 155 | |
| 156 | |
| 157 | |
| 158 | |
| 159 | = Implementation Notes = |
| 160 | |
| 161 | |
| 162 | |
| 163 | * All caching code lives in a contrib app at first. A custom `QuerySet` class |
| 164 | |
| 165 | derives from the official class, overriding where appropriate. A `Manager` |
| 166 | |
| 167 | class with an overriden `get_query_set()` is used for testing, and |
| 168 | |
| 169 | additional middleware, etc. are located in the same folder. Near or upon |
| 170 | |
| 171 | completion, the new code can be merged to trunk as Django proper. Hopefully |
| 172 | |
| 173 | the code will not be too invasive, but quite a few `QuerySet` methods will |
| 174 | |
| 175 | have to be hijacked. |
| 176 | |
| 177 | |
| 178 | |
| 179 | * If the transaction middleware is enabled, it is desirable to have the cache |
| 180 | |
| 181 | only update when the transaction succeeds. This is simple in implementation |
| 182 | |
| 183 | but will couple the transaction middleware to the cache if not designed |
| 184 | |
| 185 | properly. An additional middleware class can be created to handle this |
| 186 | |
| 187 | case; however, it will have to stipulate placement immediately after the |
| 188 | |
| 189 | `TransactionMiddleware` in settings.py, and might be confused with the |
| 190 | |
| 191 | existing `CacheMiddleware`. |
| 192 | |
| 193 | |
| 194 | |
| 195 | |
| 196 | |
| 197 | = Timeline = |
| 198 | |
| 199 | |
| 200 | |
| 201 | == First Month == |
| 202 | |
| 203 | |
| 204 | |
| 205 | * Write preliminary tests. Initial implementation of `cache()` for single |
| 206 | |
| 207 | objects. Support almost all typical `QuerySet` methods. |
| 208 | |
| 209 | |
| 210 | |
| 211 | * Devise a generic idiom for testing cache-related code. Work on agregates; |
| 212 | |
| 213 | implement `select_related()`, `values()`, `in_bulk()` cases, and |
| 214 | |
| 215 | `cache_generic()` method. |
| 216 | |
| 217 | |
| 218 | |
| 219 | == Second Month == |
| 220 | |
| 221 | |
| 222 | |
| 223 | * Work on signal dispatching, cache coherency. Write more tests and preliminary |
| 224 | |
| 225 | documentation. |
| 226 | |
| 227 | |
| 228 | |
| 229 | * Write "smart" cache logic. Explore other possible optimizations. |
| 230 | |
| 231 | |
| 232 | |
| 233 | * Add transaction support. Design decision needed about extra middleware. |
| 234 | |
| 235 | |
| 236 | |
| 237 | * Implement extra features if possible (`distinct()`, `extra(select=...)`, ...) |
| 238 | |
| 239 | |
| 240 | |
| 241 | == Last Month == |
| 242 | |
| 243 | |
| 244 | |
| 245 | * Write up documentation, extensive tests, and example code. Possibly move from |
| 246 | |
| 247 | contrib into the main cache module. |
| 248 | |
| 249 | |
| 250 | |
| 251 | * Refactor, especially if the new `QuerySet` has been released. Continue |
| 252 | |
| 253 | merging with changes to trunk and testing. |
| 254 | |
| 255 | |
| 256 | |
| 257 | * Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc. |
| 258 | |
| 259 | |
| 260 | |