Opened 14 years ago

Closed 13 years ago

Last modified 13 years ago

#13283 closed (fixed)

CACHE_MIDDLEWARE_ANONYMOUS_ONLY kills anonymous caching efficiency

Reported by: Carl Meyer Owned by: nobody
Component: Core (Cache system) Version: dev
Severity: Keywords: session accessed vary cookie
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The Django documention recommends CACHE_MIDDLEWARE_ANONYMOUS_ONLY as "a simple and effective way of disabling caching for any user-specific pages (include Django's admin interface)." While this is strictly true, it glosses over a significant problem with using CACHE_MIDDLEWARE_ANONYMOUS_ONLY: it causes FetchFromCacheMiddleware to check request.user, which accesses the session, which causes SessionMiddleware to set Vary: Cookie, which means that anonymous pages are cached with a session-specific key, which pretty much destroys the efficiency of caching those pages. (Bugs causing similar behavior have been fixed previously: #3586 and #6552).

Attachments (2)

13283_r12936.diff (3.6 KB ) - added by Carl Meyer 14 years ago.
fix via patching request.session.accessed
13283_github_pull_4.diff (9.1 KB ) - added by Niran Babalola 13 years ago.
Fix by caching all non-user-variable requests

Download all attachments as: .zip

Change History (26)

comment:1 by Carl Meyer, 14 years ago

Obviously, it's not possible to check whether the user is logged-in without accessing the session. However, since FetchFromCacheMiddleware knows that its access of request.user doesn't actually change the response content, it could reset request.session.accessed to False if it was previously False. This approach seems to work well in my tests; I'll prepare a patch with it.

by Carl Meyer, 14 years ago

Attachment: 13283_r12936.diff added

fix via patching request.session.accessed

comment:2 by Carl Meyer, 14 years ago

Patch attached with test, fixes the problem, full test suite passes.

This patch slightly changes the semantics of session.accessed; if somebody had their own middleware code checking session.accessed for some entirely different reason, the cache middleware's access would not show up. OTOH, session.accessed is not documented API and I can't think of a real use case this would break. Most of the use cases for checking it, this version provides more useful information (you probably want to know if it was accessed for some page-specific purpose, not that its accessed every request by the cache middleware). Django only checks it in SessionMiddleware for purposes of setting Vary: Cookie.

comment:3 by Carl Meyer, 14 years ago

Has patch: set

comment:4 by Gabriel Hurley, 14 years ago

Keywords: session accessed vary cookie added

This is a similar type of report to #13217 dealing with the session.accessed/vary: cookie problem. There's no easy solution to the overarching issues, but it looks like this may be an area that needs design review at some point in the future.

comment:5 by Carl Meyer, 14 years ago

This is fundamentally different from #13217.

In that case, the result of reading the session actually does change the response content, because a language preference might be stored in the session. Therefore Django's behavior (adding Vary: Cookie) is correct; doing anything else would break caching.

In this case, the result of reading the session is only used to determine whether the response should be cached or not; it does not change the response itself in any way. If the determination is that the result should be cached, Django should not add Vary: Cookie in this case.

comment:6 by Gabriel Hurley, 14 years ago

Didn't mean to imply the issues were the same. Only noting that there have been other reports of the session.accessed mechanism having been at issue lately, which often means there may be a deeper issue lurking. Your report makes sense to me, but session-middleware interactions aren't my area of expertise.

comment:7 by Russell Keith-Magee, 14 years ago

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

I'll accept the use case, but I'm not convinced this is necessarily the solution.

  • If we're going to go down this path, I'd rather do it through an API (e.g., a 'peek' call that doesn't affect the accessed flag) rather than messing with caching internals outside the cache layer.
  • I'd rather not go down this path at all. Adding an API to work around the 'accessed' flag strikes me as as very bad idea - no matter how well we document such an API, it *will* get abused.

However, I don't have any brilliant alternatives. My only thought is to see if we can change the way we determine if a user is Anonymous, thereby avoiding the session hit, or performing a different operation on the session so that we either don't modify the accessed flag, or pose the operation in such a way that it doesn't pose the same abuse risk.

comment:8 by Russell Keith-Magee, 14 years ago

#13285 describes a closely related presentation with populate_xheaders.

comment:9 by Russell Keith-Magee, 14 years ago

Scratch that - #13285 is a different problem.

comment:10 by Carl Meyer, 14 years ago

We talked briefly on IRC, I'll record my thoughts here for posterity.

A) There's no way to determine anything about the user without looking in the session, as contrib.auth is built atop contrib.session. We need to check the session for CACHE_MIDDLEWARE_ANONYMOUS_ONLY to work at all.

B) In pretty much all cases, checking the user/session should result in Vary: Cookie, because why are you checking it if not to modify the response accordingly in some way? (The only other thing you might do is mutate server-side state, but you shouldn't do that on a GET anyway, and POSTs are not cached regardless.) So documenting an API for "check the session but don't tell anyone I did" is almost certainly wrong.

So if we accept that in this one, strange, edge case, we need to know if a user is logged in, but not to modify the response, only to know whether to cache it, it seems like the choices are either to poke directly at request.session.accessed, as my patch does, or provide some other internal, undocumented API for, essentially, poking at request.session.accessed. The latter would be a more invasive change (especially since the route by which the session gets accessed here is quite indirect, via the lazy request.user), and I'm not sure I see the advantage. But I'm open to working on such a patch if it would be preferred.

(The clean option here would be to kill the feature: the core problem is that CACHE_MIDDLEWARE_ANONYMOUS_ONLY is an odd mongrel sitting midway between contrib.sessions/contrib.auth and the caching framework. IMO the ugliness of this patch is just a reflection of that ugliness).

comment:11 by Silvio J. Gutierrez, 14 years ago

As far as I can tell (not scientifically tested), if you place django.middleware.cache.UpdateCacheMiddleware *under* the session middleware (doing what the documentation tells you *not* to do), then the cache middleware will behave correctly in the basic form. That is, anonymous users see a cached page, and logged in users don't. If you turn off CACHE_MIDDLEWARE_ANONYMOUS_ONLY, then both types of users see a cached page.

However, this solution probably breaks some more refined controls using Vary. As I don't use these though, this is a good temporary solution till the problem is fixed.

comment:12 by Niran Babalola, 13 years ago

I've written a patch that only avoids caching the response when it has the "Vary: Cookie" header already set.

Here's the pull request with more explanation: https://github.com/django/django/pull/4

in reply to:  12 comment:13 by Carl Meyer, 13 years ago

Replying to natrius:

I've written a patch that only avoids caching the response when it has the "Vary: Cookie" header already set.

Very cool - that's an ingenious solution, taking advantage of the fact that for any authenticated request where request.user was accessed, the Vary: Cookie header will already have been set; so if it hasn't been set, we can go ahead and cache the response. (It's possible for a request from an authenticated user to be cached under this patch, but only if request.user was never accessed; which means for caching purposes the request may as well have been unauthenticated; the response is not user-variable.)

I'd like to see a test case added covering the scenario in question: demonstrating that the cache middleware no longer adds Vary: Cookie to an unauthenticated request with CACHE_MIDDLEWARE_ANONYMOUS_ONLY set to True.

comment:14 by Niran Babalola, 13 years ago

Carl, I added a modified version of your test from the patch, which caught an issue I hadn't seen. It's in the pull request.

comment:15 by Chris Tandiono, 13 years ago

I have to say that, while changing the behavior of CACHE_MIDDLEWARE_ANONYMOUS_ONLY might seem like the route to take (and I'd certainly like it that way), I'm not sure it is: SessionMiddleware uses a different session per anonymous user because there are some sites that might actually require different sessions per anonymous user. Instead, I think there should be a new setting that causes SessionMiddleware to consider all anonymous users to be part of the same session, or not create sessions for anonymous users. Something like "if ANONYMOUS_SHARE_SESSION, don't set Vary: Cookie".

CACHE_MIDDLEWARE_ANONYMOUS_ONLY doesn't kill anonymous caching efficiency; it's that we're trying to use it in a way it's not necessarily meant to be used...

comment:16 by Chris Tandiono, 13 years ago

Er, make that "if ANONYMOUS_SHARE_SESSION and request.user.is_anonymouse(), don't set Vary: Cookie".

in reply to:  15 comment:17 by Carl Meyer, 13 years ago

Replying to christandiono:

I have to say that, while changing the behavior of CACHE_MIDDLEWARE_ANONYMOUS_ONLY might seem like the route to take (and I'd certainly like it that way), I'm not sure it is: SessionMiddleware uses a different session per anonymous user because there are some sites that might actually require different sessions per anonymous user.

Sure. If the session data is actually accessed during a request, then Vary: Cookie should be set on that response, and the cache middleware should respect it. Nothing here is questioning that, or the usefulness of anonymous sessions in general.

But when nothing in a given request cycle has ever accessed the session, that particular response is not session-dependent and should be cached once for all users; for CACHE_MIDDLEWARE_ANONYMOUS_ONLY _itself_ to trigger the addition of Vary: Cookie is pointless and self-defeating.

The thing you're missing, I think, is that the addition of Vary: Cookie should not be dependent on whether a session exists that is specific to a given browser session. If you have the session middleware enabled, that will always be true, and should be so. Regardless, a particular request that does not access the session data should not set Vary: Cookie.

Sharing anonymous sessions, as you propose, seems to me to make anonymous sessions useless. Avoiding the creation of a session until login time might be interesting on its own merits as a feature request, but is a totally separate issue from this bug.

comment:18 by Niran Babalola, 13 years ago

This seems like a problem that should be ameliorated somehow for Django 1.3. The setting seems to have been added before CacheMiddleware respected Vary headers in order to prevent user-specific pages from being cached and shown to other users. That is no longer possible, so the setting should be removed. At the minimum, the setting should be marked as deprecated in the documentation instead of touted as a useful feature so people don't keep using a broken, unnecessary setting. If we want users to be able to save memory by not caching user-specific pages, I have a patch to do that with a separate setting, but CACHE_MIDDLEWARE_ANONYMOUS_ONLY must die soon. My patch provides a reasonable transition behavior for existing users of the setting before it goes away completely.

by Niran Babalola, 13 years ago

Attachment: 13283_github_pull_4.diff added

Fix by caching all non-user-variable requests

in reply to:  18 comment:19 by Carl Meyer, 13 years ago

Replying to natrius:

This seems like a problem that should be ameliorated somehow for Django 1.3.

I'm planning to get back to your pull request and commit some form of it before Django 1.3 -- just haven't gotten there yet.

If we want users to be able to save memory by not caching user-specific pages, I have a patch to do that with a separate setting, but CACHE_MIDDLEWARE_ANONYMOUS_ONLY must die soon. My patch provides a reasonable transition behavior for existing users of the setting before it goes away completely.

What do you see as inadequate about your pull request mentioned above, as a way to preserve CACHE_MIDDLEWARE_ANONYMOUS_ONLY? Why do you think it needs to go away completely, and/or be replaced by another setting?

I understand that simply caching all requests is often fine, since the caching will vary by cookie anyway and won't result in the wrong content being shown to a user. But in some cases it's still useful to be able to easily turn off caching for all user-variable pages, since for some sites they may be the pages with the most dynamic content (and caching with Vary: Cookie eats up cache memory a lot faster).

comment:20 by Niran Babalola, 13 years ago

My patch still caches responses that vary by cookie for anonymous users, which is probably undesirable for users who would want to set this in the first place. A setting that skips any response that varies by cookie is a cleaner and more effective way to do it. (My local patch makes altering the caching conditions in UpdateCacheMiddleware subclasses much easier and provides a SkipVaryCookieUpdateCacheMiddleware instead of a setting, which is a more extensible solution.)

Having CACHE_MIDDLEWARE_ANONYMOUS_ONLY at all will give users an incorrect mental model of how caching in general works. Anonymity is irrelevant. Vary headers are the right place for users to be looking, so an appropriately named option that performs the correct behavior will have positive educational side-effects in addition to the small performance benefits.

in reply to:  20 comment:21 by Carl Meyer, 13 years ago

Replying to natrius:

Having CACHE_MIDDLEWARE_ANONYMOUS_ONLY at all will give users an incorrect mental model of how caching in general works. Anonymity is irrelevant. Vary headers are the right place for users to be looking, so an appropriately named option that performs the correct behavior will have positive educational side-effects in addition to the small performance benefits.

I agree; the question is just how much deprecation churn that benefit is worth. I may start a django-developers thread to discuss this at some point, or if you want to it'd be helpful to get other developers' thoughts.

Making UpdateCacheMiddleware more easily extensible is potentially interesting; I'm not sure I want to start proliferating subclasses of it in Django, though, so I'm still more inclined towards a setting (CACHE_MIDDLEWARE_SKIP_VARY_COOKIE?). I think it's also possible that we could just make the behavior change you're suggesting (cache non-Vary-Cookie only, regardless of anonymity), and leave the slightly-misleading setting name, with a documentation note about the name, to reduce deprecation churn.

In any case, this needs to be split into two changes at this point. Pre-1.3, I'd like to fix the most pressing issue, which is adding Vary: Cookie when it shouldn't be there. That's a bugfix. The rest of this is more in feature territory; now that we're post-beta for 1.3 I don't think we can deprecate the current setting, add a new one, or make additional behavior changes, so that's a discussion for the 1.4 timeframe.

comment:22 by Niran Babalola, 13 years ago

I agree with each of the possible solutions you've suggested as well as the timeline.

comment:23 by Carl Meyer, 13 years ago

Resolution: fixed
Status: newclosed

In [15381]:

Fixed #13283 -- Corrected CACHE_MIDDLEWARE_ANONYMOUS_ONLY's bad habit of setting Vary: Cookie on all responses and destroying cache efficiency. Thanks to natrius for the fix.

comment:24 by Carl Meyer, 13 years ago

In [15382]:

[1.2.X] Fixed #13283 -- Corrected CACHE_MIDDLEWARE_ANONYMOUS_ONLY's bad habit of setting Vary: Cookie on all responses and destroying cache efficiency. Thanks to natrius for the fix.

Backport of r15381 from trunk.

Note: See TracTickets for help on using tickets.
Back to Top