#15201 closed Cleanup/optimization (fixed)
CACHE_MIDDLEWARE_ANONYMOUS_ONLY is ugly, misleading, and unnecessary, and should be deprecated
Reported by: | Carl Meyer | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Core (Cache system) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | raymond.penners@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
From https://github.com/django/django/pull/4:
There are three supposed benefits of using the cache middleware with CACHE_MIDDLEWARE_ANONYMOUS_ONLY set to True: 1) Don't show user-specific data to the wrong users. 2) Avoid wasting memory on caching user-specific pages. 3) Improve performance by caching most pages.
(1) is unnecessary: Django already sets Vary: Cookie headers when the session is accessed, which will prevent users from getting another user's cached pages. In fact one danger of the current documentation of CACHE_MIDDLEWARE_ANONYMOUS_ONLY is that it does not mention Vary: Cookie and may mislead users into thinking that CACHE_MIDDLEWARE_ANONYMOUS_ONLY is necessary to avoid users being shown wrong cached pages.
CACHE_MIDDLEWARE_ANONYMOUS_ONLY is only partially effective at (2), because it will still cache session-specific pages with Vary: Cookie if a user is not logged in, which can blow up caching memory just as much or more as authenticated requests, if sessions are used with anonymous users.
And on the downside, CACHE_MIDDLEWARE_ANONYMOUS_ONLY is far from simple: it requires contrib.auth to be installed before it can be used, and is tricky to implement correctly (until the fix for #13283 was committed, the incorrect implementation in place for years meant that turning CACHE_MIDDLEWARE_ANONYMOUS_ONLY on immediately destroyed caching efficiency for your entire site, including anonymous pages).
An alternative hypothetical setting, CACHE_MIDDLEWARE_SKIP_VARY_COOKIE, would give users a more accurate model of how caching actually works, would remove both the complexity of implementation and the needless dependence on contrib.auth, and would be more effective at improving caching efficiency (because it would also avoid caching session-dependent-but-anonymous requests).
I think this behavior change should be made in any case; it's a clear improvement with no downside that I can see. It would be possible to make the behavior change and leave the current (somewhat misleading) setting name, to avoid deprecation churn. I think I lean towards going ahead and changing the name of the setting too; based on the lack of outcry around #13283 I get the impression few users are using CACHE_MIDDLEWARE_ANONYMOUS_ONLY anyway, and it's a painless setting name switch.
Thanks to natrius in the above pull request and the #13283 comments for these observations.
Change History (17)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 13 years ago
Severity: | → Normal |
---|---|
Type: | → Cleanup/optimization |
comment:3 by , 13 years ago
comment:4 by , 13 years ago
Easy pickings: | unset |
---|---|
UI/UX: | unset |
The use case I described above that would justify keeping CACHE_MIDDLEWARE_ANONYMOUS_ONLY is bogus - by sending out anonymous responses without Vary: Cookie you'd allow upstream caches to serve that anonymous response to anyone, including logged-in staff users. Really there is no circumstance where you can justify checking the user's logged-in status without adding Vary: Cookie to the response.
comment:5 by , 12 years ago
I hit this bug too. I only noticed because I have some randomized content and it changed on every refresh instead of staying cached for some time.
In my case, the problem wasn't reproducible outside of the browser (eg. with unit tests or with curl). The culprit was Google Analytics. It sets __utcma
, __utcmb
, __utcmc
and __utcmz
cookies. Since Django's cache engine honors Vary: Cookie
, and my templates hide some parts for anonymous users, this results in per-user cache instead of global cache. This is a variant of point 2) in the report above.
I'm not familiar with tracking technology, but I believe cookies is the most straightforward and compatible implementation. I expect this problem to occur on any website that uses Google Analytics or a similar service. In other words, CACHE_MIDDLEWARE_ANONYMOUS_ONLY
most certainly malfunctions silently for every non-trivial website, even if its behavior is checked by unit tests!
The feature we want here is "cache content for anonymous users, regardless of Vary: Cookie". Obviously, it clashes with the current APIs, because it means triggering HTTP-level caching based on application-level information. And it works only under the assumption that, when a user is anonymous, cookies aren't used for anything that affects the output (besides determining that the user is anonymous).
This assumption cannot be checked automatically. It's something the developer declares by turning CACHE_MIDDLEWARE_ANONYMOUS_ONLY
on. For instance, {% if user.is_authenticated %}Welcome, {{ user }}!{% endif %}
is enough to turn "Vary: Cookie" on, but the developer knows that it results in the same output for all anonymous users. Often this holds true for an entire site, and that's why CACHE_MIDDLEWARE_ANONYMOUS_ONLY
exists.
To sum up, the expected behavior is "ignore Vary: Cookie when caching pages for anonymous users" (both for update and fetch).
I think that's a useful behavior, more useful that "cache pages that don't have Vary: Cookie", because it's almost impossible not to have Vary: Cookie.
If it cannot be implemented satisfactorily, then we should deprecate CACHE_MIDDLEWARE_ANONYMOUS_ONLY
.
Naive idea that I haven't tested:
- if the setting is on, the user is anonymous, and the Vary header contains "Cookie"
- then remove "Cookie" from Vary, do the caching operations, and restore "Cookie" in Vary.
comment:6 by , 12 years ago
My idea isn't going to play well with CSRF. {% csrf_token %}
really depends on the cookie. And it's very easy to forget it does.
Under the Don't Give Users Guns Aimed At Feet policy, I'm retracting it.
This leaves us with Carl's idea of "cache anything that doesn't have Vary: Cookie" — but is it even useful?
1) Very few websites will match this requirement. Most websites indicate when a user is logged in. Many show a login form on every page for anonymous users, making anonymous pages depend on the cookie (because of the CSRF token).
2) Since it's purely HTTP-level, it could be done by a caching reverse proxy. For instance, Varnish seems way more capable of dealing with this than Django.
comment:8 by , 12 years ago
Cc: | added |
---|
comment:9 by , 12 years ago
Another argument for nuking CACHE_MIDDLEWARE_ANONYMOUS_ONLY
is that it creates a dependency of core on contrib (namely, django.contrib.auth). See #8713.
comment:10 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:11 by , 11 years ago
Owner: | changed from | to
---|
comment:12 by , 11 years ago
After a quick discussion with ambv at the DjangoCon sprints, I'm also leaning towards deprecating this setting and not providing a replacement.
comment:13 by , 11 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
PR: https://github.com/django/django/pull/1114
This patch is on the right track. Two tests for CACHE_MIDDLEWARE_ANONYMOUS_ONLY trigger warnings (tests % PYTHONPATH=.. python2.6 -Wd runtests.py --settings=test_sqlite cache
). These warnings should be silenced.
comment:15 by , 11 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
I'm putting this on my review list. I may commit it only after the sprints.
comment:16 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
There is one use-case for CACHE_MIDDLEWARE_ANONYMOUS_ONLY that wouldn't be covered by a hypothetical CACHE_MIDDLEWARE_SKIP_VARY_COOKIE. That is the case where you want to check for an authenticated user (or maybe a staff user) and provide them with some extra info of some kind in the response (debugging info, maybe, or view of hidden content), and you want to never cache these authenticated responses, but you want to cache the anonymous response _without_ triggering a Vary: Cookie by your access of request.user (since Vary: Cookie with sessions dramatically reduces the effectiveness of caching the anonymous response).
This use case hasn't worked for cache-middleware users for a long time (#13283), and only accidentally worked for users of the cache_page decorator due to interaction with another bug (#15855). In 1.3, because of the fix I committed for #13283, this use case is no longer addressed at all: cache-middleware users will get the results of the anonymous requests cached, but with Vary: Cookie because the session was accessed. Cache-page users will also get the results of the anonymous requests cached, but without Vary: Cookie (due to #15855), and this cached response might then be wrongly served to an authenticated user (because the fix for #13283 required that FetchFromCacheMiddleware no longer check request.user.is_authenticated()).
In order to really cater to this use-case, presuming we fix #15855, we would have to provide some special API to "check request.user but please don't set Vary: Cookie on the response, I know what I'm doing." This API would probably need to check that CACHE_MIDDLEWARE_ANONYMOUS_ONLY is True, and refuse to operate otherwise (or else it would make it too easy to shoot yourself in the foot by sending authenticated responses without Vary: Cookie).
The use case sounds reasonable, but I'm not at all convinced that this is actually something we want in core. I think I still come down on the side of removing CACHE_MIDDLEWARE_ANONYMOUS_ONLY.