Opened 14 years ago

Last modified 22 months ago

#15855 new Bug

cache_page decorator bypasses any Vary headers set in middleware

Reported by: Carl Meyer Owned by:
Component: Core (Cache system) Version:
Severity: Normal Keywords:
Cc: django@…, inactivist@…, denis.cornehl@…, someuniquename@…, Fernando Gutiérrez Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

A number of common response middlewares in Django (gzip, sessions, locale, csrf) add to the Vary header on responses, which is checked both by Django's caching system and upstream HTTP caches to determine what requests can safely be served that cached response. Getting the Vary header correct can be quite important, as failure to include it can mean upstream caches show session-private content to users who should not see it.

Since view decorators run on the outgoing response first, before response middleware, the cache_page decorator caches the response before any of the mentioned response middlewares have a chance to add their Vary headers. This means two things: 1) the cache key used won't include the headers the response ought to vary on, and Django may later serve that response to users who really shouldn't get it, and 2) when that cached response is later served to a user, it still won't include the Vary header that it should have, and thus may also be cached wrongly by an upstream HTTP cache.

I can't see a reasonable way of fixing this that maintains the current semantics of the cache_page decorator. The only option I've come up with is to require all users of full-response caching to always include the UpdateCacheMiddleware in their MIDDLEWARE_CLASSES (in its recommended spot at the top of the list, thus last in response processing), and only do the actual caching there. We'd then have to provide some way to say "don't cache pages unless they are marked" (a setting? ugh), and then the cache_page decorator would just mark the response for later caching (which would override the global don't-cache flag).

Attachments (1)

15855_CSRF_per_view_caching_docs.diff (1.8 KB ) - added by Idan Gazit 14 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 by Carl Meyer, 14 years ago

Triage Stage: UnreviewedDesign decision needed

by Idan Gazit, 14 years ago

comment:2 by Idan Gazit, 14 years ago

Easy pickings: unset
Has patch: set

I just got bitten by this; the docs about mixing CSRF and per-view caching are misleading. The vary_on_cookie decorator doesn't actually solve the fact that te CSRF token which is placed in the template of a cached view.

Having sat with Russ, Alex, and Jannis on the matter, the conclusion is that the docs are simply wrong, and bear updating. To that end, see attached docs patch.

in reply to:  2 comment:3 by Carl Meyer, 14 years ago

UI/UX: unset

Replying to idangazit:

I just got bitten by this; the docs about mixing CSRF and per-view caching are misleading. The vary_on_cookie decorator doesn't actually solve the fact that te CSRF token which is placed in the template of a cached view.

Having sat with Russ, Alex, and Jannis on the matter, the conclusion is that the docs are simply wrong, and bear updating. To that end, see attached docs patch.

I'm not clear on the issue here - if vary_on_cookie doesn't solve the problem, then how is it solved by using the cache middleware rather than per-page caching? What, exactly, is the issue?

In any case, if vary_on_cookie doesn't solve this problem, then I think this problem deserves its own separate ticket and doesn't belong in this ticket, which is specifically about Vary header issues with the cache_page decorator. Even if the attached doc patch is correct, which is not yet clear to me, applying it would not be a fix for this ticket.

comment:4 by Russell Keith-Magee, 14 years ago

There's two problems here:

  1. The subject of this ticket -- that cache_page has problems because it caches content too early in the process. This is the root problem.
  2. The fact that the docs currently suggest that @vary_on_header and @cache_page can be used on CSRF pages.

The second problem will be resolved when/if we fix the first problem, but in the interim, the existence of the suggestion is pretty misleading. Idan's patch looks good for that purpose.

comment:5 by Luke Plant, 14 years ago

It still hasn't been explained why @vary_on_cookie and @cache_page don't work with CSRF pages. Idan had a sentence that looked like it was about to explain it and then stopped. I'm guessing it is do with the cookie being set by the middleware after the page has been cached. Would the documentation be fixed by adding @csrf_protect into the stack of decorators?

in reply to:  5 comment:6 by Carl Meyer, 14 years ago

Replying to lukeplant:

It still hasn't been explained why @vary_on_cookie and @cache_page don't work with CSRF pages. Idan had a sentence that looked like it was about to explain it and then stopped. I'm guessing it is do with the cookie being set by the middleware after the page has been cached. Would the documentation be fixed by adding @csrf_protect into the stack of decorators?

Which is why a new ticket should be created, so the actual problem can be clearly identified and dealt with appropriately, without muddying the waters of this ticket. As far as this ticket goes, the current workaround documented in the CSRF docs is correct, and clearly so: vary_on_cookie does in fact add the Vary: Cookie header, which unequivocally fixes the problem of the response not having the Vary: Cookie header. And that's the only problem this ticket is concerned about.

If there is a different problem with the current CSRF docs that needs fixing, then it is a CSRF-specific problem unrelated to this ticket (even if it seems superficially related because it involves the cache_page header).

comment:7 by Russell Keith-Magee, 14 years ago

The reason why @vary_on_cookie and @cache_page doesn't work with CSRF pages is exactly what this ticket describes. The new-style CSRF page puts a CSRF token in the HTML. The page needs to be marked @vary_on_cookie because the CSRF token, and thus the cached page content, varies on a per-user basis. However, the @cache_page decorator caches the page before the CSRF middleware has a chance to set the CSRF cookie. As a result, @vary_on_cookie doesn't actually cause anything to vary -- there isn't a cookie to vary on at the time the @cache_page decorator determines the content to be cached.

This wasn't an issue with the old-style CSRF code because the page content was modified by the CSRF middleware. The current docs were accurate for the old CSRF approach, but not for the new, explicitly inserted {% csrf_token %}.

comment:8 by Luke Plant, 14 years ago

Thanks Russell for the clarification. I'm pretty sure it can be fixed by use of @csrf_protect:

@cache_page(60 * 15) 
@vary_on_cookie
@csrf_protect
def my_view(request): 
    pass

But however the docs are fixed on this issue, it doesn't technically address this bug, just one symptom of it.

comment:9 by Carl Meyer, 14 years ago

Verified that Luke's fix works on both 1.3.X and trunk. In fact, vary_on_cookie isn't even needed, since csrf_protect patches the vary header -- the docs can just replace vary_on_cookie with csrf_protect.

Went a little ways down the path of an automated test to verify that this works, but since we have tests for all the component pieces already, the test ends up just being a verification that Python does indeed apply decorators in the expected order. Which is kind of a stupid thing to test.

Applying the doc fix to 1.3.X and trunk.

comment:10 by Carl Meyer, 14 years ago

In [16361]:

Refs #15855 -- Recommended the csrf_protect decorator rather than vary_on_cookie as workaround for cache_page caching the response before it gets to middleware.

comment:11 by Carl Meyer, 14 years ago

In [16362]:

[1.3.X] Refs #15855 -- Recommended the csrf_protect decorator rather than vary_on_cookie as workaround for cache_page caching the response before it gets to middleware.

Backport of r16361 from trunk.

comment:12 by Carl Meyer, 14 years ago

Has patch: unset

comment:13 by jedie, 13 years ago

Cc: django@… added

comment:14 by Michael Curry, 13 years ago

Cc: inactivist@… added

comment:15 by Carl Meyer, 12 years ago

Accepting this, since we're trying to eliminate the DDN state. There are still significant design decisions needed in terms of how this should be fixed, but I don't think there's any question it should be fixed (or else the cache_page decorator should be removed). There's no scenario where you want full-response caching that omits key Vary headers.

comment:16 by Carl Meyer, 12 years ago

Triage Stage: Design decision neededAccepted

comment:17 by Carl Meyer, 12 years ago

Just ran into Pyramid's response callbacks concept, and I think we could consider introducing something like that as a fix for this bug. It would be a new method on HttpResponse (in Pyramid it's on the request, but that's only because you often don't have access to the actual response yet; in Django you generally do) to register a callback to be executed right before the response is sent out (i.e. after all middleware). So the cache_page decorator would just register a response callback to do the actual caching, thus the response that would be cached would be the final one, after it's been processed by all middleware. There may be other uses for this feature, but initially it could just be a private API until those uses become clearer.

Presuming cache_page continued to be generated via decorator_from_middleware from the cache middlewares, that would imply UpdateCacheMiddleware would also register a response callback rather than doing the caching immediately. This would be a change in behavior for anyone who does not have UpdateCacheMiddleware first in their middlewares (as recommended in the docs).

Actually, this would also obviate the need for the UpdateCacheMiddleware/FetchFromCacheMiddleware split; we could just go back to recommending the use of a single unified CacheMiddleware placed last in MIDDLEWARE_CLASSES, and the actual cached response would still always be the final about-to-be-sent-out response.

Thoughts?

comment:18 by Denis Cornehl, 10 years ago

Cc: denis.cornehl@… added

comment:19 by Rinat Khabibiev, 8 years ago

Owner: changed from nobody to Rinat Khabibiev
Status: newassigned

comment:20 by Simon Charette, 8 years ago

Has patch: set

comment:21 by Alexandre Hajjar, 8 years ago

Needs documentation: set

Reviewed PR which seems fine: Vary headers from cache_page decorator & other middlewares are taken into account by cachemiddleware, reg tests ok, new tests added.

Added comments for minor improvements in PR. Needs update in docs IMHO.

comment:22 by Evstifeev Roman, 8 years ago

Cc: someuniquename@… added

add someuniquename@… to CC list

comment:23 by Fernando Gutiérrez, 8 years ago

Cc: Fernando Gutiérrez added

comment:24 by Mariusz Felisiak, 22 months ago

Owner: Rinat Khabibiev removed
Status: assignednew
Note: See TracTickets for help on using tickets.
Back to Top