Code

Opened 6 years ago

Last modified 6 months ago

#9249 assigned New feature

Google Analytics' Cookies break CacheMiddleware when SessionMiddleware turns on Vary: Cookie

Reported by: pixelcort Owned by: ambv
Component: HTTP handling Version: 1.0
Severity: Normal Keywords: cache cookies
Cc: raymond.penners@…, django@…, harm.verhagen@…, trbs@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by aaugustin)

When using Google Analytics on a Django project with CacheMiddleware and SessionMiddleware turned on, the Cookies that Google Analytics apparently change on each reload, invalidating the Vary: Cookie parameter that SessionMiddleware is setting.

There should be a way to define cookie prefixes, such as 'utm', to ignore for cookie variation for caching.

Attachments (1)

cache_ignore_cookie.patch (5.4 KB) - added by vvd 5 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 5 years ago by jacob

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 5 years ago by thejaswi_puthraya

  • Component changed from Uncategorized to HTTP handling

comment:3 Changed 5 years ago by vvd

  • Owner changed from nobody to vvd

Changed 5 years ago by vvd

comment:4 Changed 5 years ago by vvd

  • Has patch set
  • Keywords cache cookies added
  • Status changed from new to assigned

I've created a patch which introduce CACHE_MIDDLEWARE_IGNORE_COOKIES setting, to specify cookies not to be considered in building cache key.

comment:5 Changed 3 years ago by carljm

  • Patch needs improvement set
  • Severity set to Normal
  • Type set to New feature

If we add a workaround for this, we need to make it clear in the docs that this is a workaround that will help Django's internal cache, but upstream HTTP caches will still be broken unless they also special-case Google Analytics cookies. See for instance https://groups.google.com/group/analytics-help-integrations/browse_thread/thread/6c9d4a0fea1cc1d2

It's really Google Analytics that's breaking HTTP caching here (more specifically, any use of Vary: Cookie, which is part of HTTP caching). We can provide a Django-specific workaround, but that doesn't fix the core problem, which isn't in Django.

Categorizing as a "new feature," since what's under discussion here is not a bug in Django, but a new feature to ease working around a problem with Google Analytics.

The latest patch here looks pretty reasonable. The setting should be just a list of plain regex strings, though, to simplify using it - the compilation can be done by the middleware once at startup time.

comment:6 Changed 3 years ago by lukeplant

  • Easy pickings unset
  • UI/UX unset

Regarding regexes - I would favour using compiled regexes - this is consistent with other settings that are regexes. (Only the URL conf appears to be different here).

comment:7 Changed 3 years ago by anonymous

Instead of listing individual cookies explicitly, I feel it would be better to have Django keep a record on whether or not a cookie was been accessed. This can be done in similar fashion as to how Django currently checks whether or not the session was accessed.

Benefits:

  • This would work without any configuration. Any additional cookies set by whatever frontend Javascript code that are not used by Django views would automatically be ignored.
  • No new setting & accompanying documentation

comment:8 Changed 3 years ago by raymond.penners@…

  • Cc raymond.penners@… added

comment:9 Changed 3 years ago by jedie

  • Cc django@… added

comment:10 Changed 3 years ago by harm.verhagen@…

  • Cc harm.verhagen@… added

comment:11 Changed 2 years ago by harm

The suggestion mentioned in comment:7: "automatically only taking into account the actual used cookies in the cache key" that would work also in the following case:

  • csrf middleware is enabled (so user sends csrftoken cookie every request.
  • Some view A depends on cookie foobar_enabled (2 values), but that specific view does NOT use any csrf token. In the current situation caching view A does not work between clients (as different csrf tokens cooies, cause different cache keys in the view that doesn't use this cookie)

With suggestion comment:6 this would automatically work.

Last edited 6 months ago by harm (previous) (diff)

comment:12 Changed 2 years ago by KyleMac

It was easy to monkey patch django.http.parse_cookie to use a custom dictionary that logs gets and sets but I had to take into account some things.

  1. This CSRF middleware accesses CSRF_COOKIE_NAME on every request so ignore that and check request.META.get['CSRF_COOKIE_USED'] instead.
  2. The session middleware accesses SESSION_COOKIE_NAME on every request so ignore that and check request.session.accessed instead.
  3. There may be other contrib middleware I'm not using and the best solution would be for everything to not access their cookies until needed.

However, it turns out that IE and Firefox have the same issue but client side. The Google Analytics cookies will cause them to completely invalidate their cache and will not even send If-Modified-Since and so any site that uses Vary: Cookie and Google Analytics effectively has no client side caching for the majority of their users.

In Chrome requests go something like the following (Opera and Safari seem similar but I've tested them less).

  1. Chrome requests a page and gets the following:
    Cache-Control: public, max-age=600
    Vary: Cookie
  1. If you navigate away from the page and come back to it (the actual reload button sometimes behaves differently in Chrome to other browsers) and the page hasn't expired (i.e. max-age) then SOMETIMES it will just serve that from the cache. I don't really understand this and it's probably some kind of heuristics.
  2. Otherwise request a new page with the If-Modified-Since and the contents of our cookies. Due to my monkey patch and custom decorator it gets a 304 response and all is good.

In IE (I tested 9 and 10) and Firefox (currently 14) it's more like the following.

  1. Receive a page with the same headers as above.
  2. Once the page is loaded, Google Analytics updates it cookies and now the page is immediately completely invalidated.
  3. Nothing is ever served from the local cache and so a new request is sent. However since the page was invalidated in step 2 not even an If-Modified-Since header is sent and so you get a full 200 response every single time.

I might now go in the opposite direction and strip out Vary: Cookie on every response and raise an exception if any cached view tries to access cookies it wasn't meant to.

comment:13 Changed 2 years ago by KyleMac

I've now managed to get this to work in Internet Explorer and Firefox.

The fix for IE is quite simple and is due to it sending a non standard If-Modified-Since header that ConditionalGetMiddleware fails to parse. I've opened ticket #18648 for that.

While Firefox won't send If-Modified-Since with Vary: Cookie it will send back the Etag. So to get Firefox to work as expected all you need to do is use a hash of Last-Modified as the Etag (assuming there isn't already a proper Etag for the response).

comment:14 Changed 21 months ago by trbs

  • Cc trbs@… added

comment:15 Changed 18 months ago by anonymous

  • Needs documentation set
  • Type changed from New feature to Bug
  • Version changed from 1.0 to 1.4-rc-2

comment:16 Changed 18 months ago by aaugustin

  • Description modified (diff)
  • Type changed from Bug to New feature
  • Version changed from 1.4-rc-2 to 1.0

If you look at the discussion, this isn't a bug; it's really a new feature.

The version field tracks the version the bug was reported in.

Related: #15201. Caching is hard.

comment:17 Changed 14 months ago by anonymous

  • Owner changed from vvd to anonymous

comment:18 Changed 14 months ago by ambv

  • Owner changed from anonymous to ambv

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
The owner will be changed from ambv to anonymous. Next status will be 'assigned'
The ticket will be disowned. Next status will be 'new'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.