Code

Opened 3 years ago

Last modified 13 months ago

#15855 new Bug

cache_page decorator bypasses any Vary headers set in middleware

Reported by: carljm Owned by: nobody
Component: Core (Cache system) Version:
Severity: Normal Keywords:
Cc: django@…, inactivist@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

A number of common response middlewares in Django (gzip, sessions, locale, csrf) add to the Vary header on responses, which is checked both by Django's caching system and upstream HTTP caches to determine what requests can safely be served that cached response. Getting the Vary header correct can be quite important, as failure to include it can mean upstream caches show session-private content to users who should not see it.

Since view decorators run on the outgoing response first, before response middleware, the cache_page decorator caches the response before any of the mentioned response middlewares have a chance to add their Vary headers. This means two things: 1) the cache key used won't include the headers the response ought to vary on, and Django may later serve that response to users who really shouldn't get it, and 2) when that cached response is later served to a user, it still won't include the Vary header that it should have, and thus may also be cached wrongly by an upstream HTTP cache.

I can't see a reasonable way of fixing this that maintains the current semantics of the cache_page decorator. The only option I've come up with is to require all users of full-response caching to always include the UpdateCacheMiddleware in their MIDDLEWARE_CLASSES (in its recommended spot at the top of the list, thus last in response processing), and only do the actual caching there. We'd then have to provide some way to say "don't cache pages unless they are marked" (a setting? ugh), and then the cache_page decorator would just mark the response for later caching (which would override the global don't-cache flag).

Attachments (1)

15855_CSRF_per_view_caching_docs.diff (1.8 KB) - added by idangazit 3 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 3 years ago by carljm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

Changed 3 years ago by idangazit

comment:2 follow-up: Changed 3 years ago by idangazit

  • Easy pickings unset
  • Has patch set

I just got bitten by this; the docs about mixing CSRF and per-view caching are misleading. The vary_on_cookie decorator doesn't actually solve the fact that te CSRF token which is placed in the template of a cached view.

Having sat with Russ, Alex, and Jannis on the matter, the conclusion is that the docs are simply wrong, and bear updating. To that end, see attached docs patch.

comment:3 in reply to: ↑ 2 Changed 3 years ago by carljm

  • UI/UX unset

Replying to idangazit:

I just got bitten by this; the docs about mixing CSRF and per-view caching are misleading. The vary_on_cookie decorator doesn't actually solve the fact that te CSRF token which is placed in the template of a cached view.

Having sat with Russ, Alex, and Jannis on the matter, the conclusion is that the docs are simply wrong, and bear updating. To that end, see attached docs patch.

I'm not clear on the issue here - if vary_on_cookie doesn't solve the problem, then how is it solved by using the cache middleware rather than per-page caching? What, exactly, is the issue?

In any case, if vary_on_cookie doesn't solve this problem, then I think this problem deserves its own separate ticket and doesn't belong in this ticket, which is specifically about Vary header issues with the cache_page decorator. Even if the attached doc patch is correct, which is not yet clear to me, applying it would not be a fix for this ticket.

comment:4 Changed 3 years ago by russellm

There's two problems here:

  1. The subject of this ticket -- that cache_page has problems because it caches content too early in the process. This is the root problem.
  2. The fact that the docs currently suggest that @vary_on_header and @cache_page can be used on CSRF pages.

The second problem will be resolved when/if we fix the first problem, but in the interim, the existence of the suggestion is pretty misleading. Idan's patch looks good for that purpose.

comment:5 follow-up: Changed 3 years ago by lukeplant

It still hasn't been explained why @vary_on_cookie and @cache_page don't work with CSRF pages. Idan had a sentence that looked like it was about to explain it and then stopped. I'm guessing it is do with the cookie being set by the middleware after the page has been cached. Would the documentation be fixed by adding @csrf_protect into the stack of decorators?

comment:6 in reply to: ↑ 5 Changed 3 years ago by carljm

Replying to lukeplant:

It still hasn't been explained why @vary_on_cookie and @cache_page don't work with CSRF pages. Idan had a sentence that looked like it was about to explain it and then stopped. I'm guessing it is do with the cookie being set by the middleware after the page has been cached. Would the documentation be fixed by adding @csrf_protect into the stack of decorators?

Which is why a new ticket should be created, so the actual problem can be clearly identified and dealt with appropriately, without muddying the waters of this ticket. As far as this ticket goes, the current workaround documented in the CSRF docs is correct, and clearly so: vary_on_cookie does in fact add the Vary: Cookie header, which unequivocally fixes the problem of the response not having the Vary: Cookie header. And that's the only problem this ticket is concerned about.

If there is a different problem with the current CSRF docs that needs fixing, then it is a CSRF-specific problem unrelated to this ticket (even if it seems superficially related because it involves the cache_page header).

comment:7 Changed 3 years ago by russellm

The reason why @vary_on_cookie and @cache_page doesn't work with CSRF pages is exactly what this ticket describes. The new-style CSRF page puts a CSRF token in the HTML. The page needs to be marked @vary_on_cookie because the CSRF token, and thus the cached page content, varies on a per-user basis. However, the @cache_page decorator caches the page before the CSRF middleware has a chance to set the CSRF cookie. As a result, @vary_on_cookie doesn't actually cause anything to vary -- there isn't a cookie to vary on at the time the @cache_page decorator determines the content to be cached.

This wasn't an issue with the old-style CSRF code because the page content was modified by the CSRF middleware. The current docs were accurate for the old CSRF approach, but not for the new, explicitly inserted {% csrf_token %}.

comment:8 Changed 3 years ago by lukeplant

Thanks Russell for the clarification. I'm pretty sure it can be fixed by use of @csrf_protect:

@cache_page(60 * 15) 
@vary_on_cookie
@csrf_protect
def my_view(request): 
    pass

But however the docs are fixed on this issue, it doesn't technically address this bug, just one symptom of it.

comment:9 Changed 3 years ago by carljm

Verified that Luke's fix works on both 1.3.X and trunk. In fact, vary_on_cookie isn't even needed, since csrf_protect patches the vary header -- the docs can just replace vary_on_cookie with csrf_protect.

Went a little ways down the path of an automated test to verify that this works, but since we have tests for all the component pieces already, the test ends up just being a verification that Python does indeed apply decorators in the expected order. Which is kind of a stupid thing to test.

Applying the doc fix to 1.3.X and trunk.

comment:10 Changed 3 years ago by carljm

In [16361]:

Refs #15855 -- Recommended the csrf_protect decorator rather than vary_on_cookie as workaround for cache_page caching the response before it gets to middleware.

comment:11 Changed 3 years ago by carljm

In [16362]:

[1.3.X] Refs #15855 -- Recommended the csrf_protect decorator rather than vary_on_cookie as workaround for cache_page caching the response before it gets to middleware.

Backport of r16361 from trunk.

comment:12 Changed 3 years ago by carljm

  • Has patch unset

comment:13 Changed 2 years ago by jedie

  • Cc django@… added

comment:14 Changed 23 months ago by inactivist

  • Cc inactivist@… added

comment:15 Changed 13 months ago by carljm

Accepting this, since we're trying to eliminate the DDN state. There are still significant design decisions needed in terms of how this should be fixed, but I don't think there's any question it should be fixed (or else the cache_page decorator should be removed). There's no scenario where you want full-response caching that omits key Vary headers.

comment:16 Changed 13 months ago by carljm

  • Triage Stage changed from Design decision needed to Accepted

comment:17 Changed 13 months ago by carljm

Just ran into Pyramid's response callbacks concept, and I think we could consider introducing something like that as a fix for this bug. It would be a new method on HttpResponse (in Pyramid it's on the request, but that's only because you often don't have access to the actual response yet; in Django you generally do) to register a callback to be executed right before the response is sent out (i.e. after all middleware). So the cache_page decorator would just register a response callback to do the actual caching, thus the response that would be cached would be the final one, after it's been processed by all middleware. There may be other uses for this feature, but initially it could just be a private API until those uses become clearer.

Presuming cache_page continued to be generated via decorator_from_middleware from the cache middlewares, that would imply UpdateCacheMiddleware would also register a response callback rather than doing the caching immediately. This would be a change in behavior for anyone who does not have UpdateCacheMiddleware first in their middlewares (as recommended in the docs).

Actually, this would also obviate the need for the UpdateCacheMiddleware/FetchFromCacheMiddleware split; we could just go back to recommending the use of a single unified CacheMiddleware placed last in MIDDLEWARE_CLASSES, and the actual cached response would still always be the final about-to-be-sent-out response.

Thoughts?

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.