| | 1 | ''This is the original Django GSoC proposal. There have been quite a few |
| | 2 | |
| | 3 | revisions since, but I'm posting this first for reference.'' |
| | 4 | |
| | 5 | |
| | 6 | |
| | 7 | = Abstract = |
| | 8 | |
| | 9 | |
| | 10 | |
| | 11 | This addition to Django's ORM adds **simple drop-in caching**, compatible with |
| | 12 | |
| | 13 | nearly all existing `QuerySet` methods. It emphasizes |
| | 14 | |
| | 15 | performance and compatibility, and providing configuration options with sane |
| | 16 | |
| | 17 | defaults. All that is required for basic functionality is a suitable |
| | 18 | |
| | 19 | `CACHE_BACKEND` setting and the addition of `.cache()` to the appropriate |
| | 20 | |
| | 21 | `QuerySet` chains. It also speeds up the lookup of related objects, and even |
| | 22 | |
| | 23 | that of [http://www.djangoproject.com/documentation/models/generic_relations generic relations]. |
| | 24 | |
| | 25 | |
| | 26 | |
| | 27 | |
| | 28 | |
| | 29 | = Proposed Design = |
| | 30 | |
| | 31 | |
| | 32 | |
| | 33 | The `QuerySet` class grows two new methods to add object caching: |
| | 34 | |
| | 35 | |
| | 36 | |
| | 37 | {{{ |
| | 38 | |
| | 39 | cache(timeout=None, prefix='qscache:', smart=False) |
| | 40 | |
| | 41 | }}} |
| | 42 | |
| | 43 | `timeout` defaults to the amount specified in `CACHE_BACKEND`. |
| | 44 | |
| | 45 | `prefix` is in addition to `CACHE_MIDDLEWARE_KEY_PREFIX`. |
| | 46 | |
| | 47 | |
| | 48 | |
| | 49 | Cache keys are calculated with the content-type id and instance id, to |
| | 50 | |
| | 51 | accomodate generic relations. |
| | 52 | |
| | 53 | |
| | 54 | |
| | 55 | Internally, `QuerySet` grows some new attributes that affect how SQL is |
| | 56 | |
| | 57 | generated. When in effect, they cause the query to only retrieve primary |
| | 58 | |
| | 59 | keys of selected objects. `in_bulk()` uses the cache directly, although |
| | 60 | |
| | 61 | cache misses will still require database hits, as usual. Methods such as |
| | 62 | |
| | 63 | `delete()` and `count()` are largely unaffected by `cache()`, but |
| | 64 | |
| | 65 | methods such as `distinct()` are a more difficult case and will require |
| | 66 | |
| | 67 | some design decisions. Using `extra(select=...)` is also a possibly |
| | 68 | |
| | 69 | unsolvable case. |
| | 70 | |
| | 71 | |
| | 72 | |
| | 73 | If `values()` has been used in the query, `cache()` takes precedence |
| | 74 | |
| | 75 | and creates the values dictionary from cache. If a list of fields is |
| | 76 | |
| | 77 | specified in `values()`, `cache()` will still perform the equivalent of a |
| | 78 | |
| | 79 | `SELECT *`. Perhaps another option could be added to allow retrieval |
| | 80 | |
| | 81 | of only the specified fields, which would break any regular cached lookup |
| | 82 | |
| | 83 | for that object. |
| | 84 | |
| | 85 | |
| | 86 | |
| | 87 | `select_related()` is supported by the caching mechanism. The appropriate |
| | 88 | |
| | 89 | joins are still performed by the database; if joins were calculated with |
| | 90 | |
| | 91 | cached object foreign key values, cache misses could be very costly. |
| | 92 | |
| | 93 | |
| | 94 | |
| | 95 | {{{ |
| | 96 | |
| | 97 | cache_generic(field, timeout=None, prefix='qscache:', smart=False) |
| | 98 | |
| | 99 | }}} |
| | 100 | |
| | 101 | |
| | 102 | |
| | 103 | `field` is the name of the generic foreign key field. |
| | 104 | |
| | 105 | |
| | 106 | |
| | 107 | Without database-specific trickery it is non-trivial to perform SQL JOINs |
| | 108 | |
| | 109 | with generic relations. Currently, a database query is required for each |
| | 110 | |
| | 111 | generic foreign key relationship. The cache framework, while unable to |
| | 112 | |
| | 113 | reduce the initial number of database hits, greatly alleviates load when |
| | 114 | |
| | 115 | lists of generic objects are required. Using this method still loads |
| | 116 | |
| | 117 | generic foreign keys lazily, but more quickly, and also uses objects cached |
| | 118 | |
| | 119 | with `cache()`. |
| | 120 | |
| | 121 | |
| | 122 | |
| | 123 | To achieve as much transparency as possible, the `QuerySet` methods quietly |
| | 124 | |
| | 125 | establish `post_save` and `post_delete` signal listeners the first time a |
| | 126 | |
| | 127 | model is cached. Object deletion is trivial. On object creation or |
| | 128 | |
| | 129 | modification, the preferred behaviour is to create or update the cached key |
| | 130 | |
| | 131 | rather than simply deleting the key and letting the cache regenerate it; |
| | 132 | |
| | 133 | the rationale is that the object is most likely to be viewed immediately after |
| | 134 | |
| | 135 | and caching it at `post_save` is cheap. However, specific cases may not be |
| | 136 | |
| | 137 | as accomodating. This is likely subject to debate or may need a global setting. |
| | 138 | |
| | 139 | |
| | 140 | |
| | 141 | To reduce the number of cache misses, additional "smart" logic can be added. |
| | 142 | |
| | 143 | For example, the first time a model is registered to the cache signal listener, |
| | 144 | |
| | 145 | its model instances are expected to be uncached. In this case, rather than |
| | 146 | |
| | 147 | fetching only primary keys, the objects are retrieved as normal (and cached). |
| | 148 | |
| | 149 | By storing the expiration time, this can also take effect whenever the |
| | 150 | |
| | 151 | cached objects have likely timed out. All "smart" functionality is enabled |
| | 152 | |
| | 153 | using the `smart` keyword argument. |
| | 154 | |
| | 155 | |
| | 156 | |
| | 157 | |
| | 158 | |
| | 159 | = Implementation Notes = |
| | 160 | |
| | 161 | |
| | 162 | |
| | 163 | * All caching code lives in a contrib app at first. A custom `QuerySet` class |
| | 164 | |
| | 165 | derives from the official class, overriding where appropriate. A `Manager` |
| | 166 | |
| | 167 | class with an overriden `get_query_set()` is used for testing, and |
| | 168 | |
| | 169 | additional middleware, etc. are located in the same folder. Near or upon |
| | 170 | |
| | 171 | completion, the new code can be merged to trunk as Django proper. Hopefully |
| | 172 | |
| | 173 | the code will not be too invasive, but quite a few `QuerySet` methods will |
| | 174 | |
| | 175 | have to be hijacked. |
| | 176 | |
| | 177 | |
| | 178 | |
| | 179 | * If the transaction middleware is enabled, it is desirable to have the cache |
| | 180 | |
| | 181 | only update when the transaction succeeds. This is simple in implementation |
| | 182 | |
| | 183 | but will couple the transaction middleware to the cache if not designed |
| | 184 | |
| | 185 | properly. An additional middleware class can be created to handle this |
| | 186 | |
| | 187 | case; however, it will have to stipulate placement immediately after the |
| | 188 | |
| | 189 | `TransactionMiddleware` in settings.py, and might be confused with the |
| | 190 | |
| | 191 | existing `CacheMiddleware`. |
| | 192 | |
| | 193 | |
| | 194 | |
| | 195 | |
| | 196 | |
| | 197 | = Timeline = |
| | 198 | |
| | 199 | |
| | 200 | |
| | 201 | == First Month == |
| | 202 | |
| | 203 | |
| | 204 | |
| | 205 | * Write preliminary tests. Initial implementation of `cache()` for single |
| | 206 | |
| | 207 | objects. Support almost all typical `QuerySet` methods. |
| | 208 | |
| | 209 | |
| | 210 | |
| | 211 | * Devise a generic idiom for testing cache-related code. Work on agregates; |
| | 212 | |
| | 213 | implement `select_related()`, `values()`, `in_bulk()` cases, and |
| | 214 | |
| | 215 | `cache_generic()` method. |
| | 216 | |
| | 217 | |
| | 218 | |
| | 219 | == Second Month == |
| | 220 | |
| | 221 | |
| | 222 | |
| | 223 | * Work on signal dispatching, cache coherency. Write more tests and preliminary |
| | 224 | |
| | 225 | documentation. |
| | 226 | |
| | 227 | |
| | 228 | |
| | 229 | * Write "smart" cache logic. Explore other possible optimizations. |
| | 230 | |
| | 231 | |
| | 232 | |
| | 233 | * Add transaction support. Design decision needed about extra middleware. |
| | 234 | |
| | 235 | |
| | 236 | |
| | 237 | * Implement extra features if possible (`distinct()`, `extra(select=...)`, ...) |
| | 238 | |
| | 239 | |
| | 240 | |
| | 241 | == Last Month == |
| | 242 | |
| | 243 | |
| | 244 | |
| | 245 | * Write up documentation, extensive tests, and example code. Possibly move from |
| | 246 | |
| | 247 | contrib into the main cache module. |
| | 248 | |
| | 249 | |
| | 250 | |
| | 251 | * Refactor, especially if the new `QuerySet` has been released. Continue |
| | 252 | |
| | 253 | merging with changes to trunk and testing. |
| | 254 | |
| | 255 | |
| | 256 | |
| | 257 | * Allow for wiggle room, `QuerySet` refactoring work, cleanup, etc. |
| | 258 | |
| | 259 | |
| | 260 | |