Opened 16 years ago

Last modified 2 years ago

#9249 closed New feature

Google Analytics' Cookies break CacheMiddleware when SessionMiddleware turns on Vary: Cookie — at Version 16

Reported by: pixelcort Owned by: Vladimir Dmitriev
Component: HTTP handling Version: 1.0
Severity: Normal Keywords: cache cookies
Cc: raymond.penners@…, django@…, harm.verhagen@…, trbs@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Aymeric Augustin)

When using Google Analytics on a Django project with CacheMiddleware and SessionMiddleware turned on, the Cookies that Google Analytics apparently change on each reload, invalidating the Vary: Cookie parameter that SessionMiddleware is setting.

There should be a way to define cookie prefixes, such as 'utm', to ignore for cookie variation for caching.

Change History (17)

comment:1 by Jacob, 16 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Thejaswi Puthraya, 16 years ago

Component: UncategorizedHTTP handling

comment:3 by Vladimir Dmitriev, 16 years ago

Owner: changed from nobody to Vladimir Dmitriev

by Vladimir Dmitriev, 16 years ago

Attachment: cache_ignore_cookie.patch added

comment:4 by Vladimir Dmitriev, 16 years ago

Has patch: set
Keywords: cache cookies added
Status: newassigned

I've created a patch which introduce CACHE_MIDDLEWARE_IGNORE_COOKIES setting, to specify cookies not to be considered in building cache key.

comment:5 by Carl Meyer, 14 years ago

Patch needs improvement: set
Severity: Normal
Type: New feature

If we add a workaround for this, we need to make it clear in the docs that this is a workaround that will help Django's internal cache, but upstream HTTP caches will still be broken unless they also special-case Google Analytics cookies. See for instance https://groups.google.com/group/analytics-help-integrations/browse_thread/thread/6c9d4a0fea1cc1d2

It's really Google Analytics that's breaking HTTP caching here (more specifically, any use of Vary: Cookie, which is part of HTTP caching). We can provide a Django-specific workaround, but that doesn't fix the core problem, which isn't in Django.

Categorizing as a "new feature," since what's under discussion here is not a bug in Django, but a new feature to ease working around a problem with Google Analytics.

The latest patch here looks pretty reasonable. The setting should be just a list of plain regex strings, though, to simplify using it - the compilation can be done by the middleware once at startup time.

comment:6 by Luke Plant, 13 years ago

Easy pickings: unset
UI/UX: unset

Regarding regexes - I would favour using compiled regexes - this is consistent with other settings that are regexes. (Only the URL conf appears to be different here).

comment:7 by anonymous, 13 years ago

Instead of listing individual cookies explicitly, I feel it would be better to have Django keep a record on whether or not a cookie was been accessed. This can be done in similar fashion as to how Django currently checks whether or not the session was accessed.

Benefits:

  • This would work without any configuration. Any additional cookies set by whatever frontend Javascript code that are not used by Django views would automatically be ignored.
  • No new setting & accompanying documentation

comment:8 by raymond.penners@…, 13 years ago

Cc: raymond.penners@… added

comment:9 by jedie, 13 years ago

Cc: django@… added

comment:10 by harm.verhagen@…, 13 years ago

Cc: harm.verhagen@… added

comment:11 by harm, 13 years ago

The suggestion mentioned in comment:7: "automatically only taking into account the actual used cookies in the cache key" that would work also in the following case:

  • csrf middleware is enabled (so user sends csrftoken cookie every request.
  • Some view A depends on cookie foobar_enabled (2 values), but that specific view does NOT use any csrf token. In the current situation caching view A does not work between clients (as different csrf tokens cooies, cause different cache keys in the view that doesn't use this cookie)

With suggestion comment:6 this would automatically work.

Last edited 11 years ago by harm (previous) (diff)

comment:12 by KyleMac, 12 years ago

It was easy to monkey patch django.http.parse_cookie to use a custom dictionary that logs gets and sets but I had to take into account some things.

  1. This CSRF middleware accesses CSRF_COOKIE_NAME on every request so ignore that and check request.META.get['CSRF_COOKIE_USED'] instead.
  2. The session middleware accesses SESSION_COOKIE_NAME on every request so ignore that and check request.session.accessed instead.
  3. There may be other contrib middleware I'm not using and the best solution would be for everything to not access their cookies until needed.

However, it turns out that IE and Firefox have the same issue but client side. The Google Analytics cookies will cause them to completely invalidate their cache and will not even send If-Modified-Since and so any site that uses Vary: Cookie and Google Analytics effectively has no client side caching for the majority of their users.

In Chrome requests go something like the following (Opera and Safari seem similar but I've tested them less).

  1. Chrome requests a page and gets the following:
    Cache-Control: public, max-age=600
    Vary: Cookie
  1. If you navigate away from the page and come back to it (the actual reload button sometimes behaves differently in Chrome to other browsers) and the page hasn't expired (i.e. max-age) then SOMETIMES it will just serve that from the cache. I don't really understand this and it's probably some kind of heuristics.
  2. Otherwise request a new page with the If-Modified-Since and the contents of our cookies. Due to my monkey patch and custom decorator it gets a 304 response and all is good.

In IE (I tested 9 and 10) and Firefox (currently 14) it's more like the following.

  1. Receive a page with the same headers as above.
  2. Once the page is loaded, Google Analytics updates it cookies and now the page is immediately completely invalidated.
  3. Nothing is ever served from the local cache and so a new request is sent. However since the page was invalidated in step 2 not even an If-Modified-Since header is sent and so you get a full 200 response every single time.

I might now go in the opposite direction and strip out Vary: Cookie on every response and raise an exception if any cached view tries to access cookies it wasn't meant to.

comment:13 by KyleMac, 12 years ago

I've now managed to get this to work in Internet Explorer and Firefox.

The fix for IE is quite simple and is due to it sending a non standard If-Modified-Since header that ConditionalGetMiddleware fails to parse. I've opened ticket #18648 for that.

While Firefox won't send If-Modified-Since with Vary: Cookie it will send back the Etag. So to get Firefox to work as expected all you need to do is use a hash of Last-Modified as the Etag (assuming there isn't already a proper Etag for the response).

comment:14 by trbs, 12 years ago

Cc: trbs@… added

comment:15 by anonymous, 12 years ago

Needs documentation: set
Type: New featureBug
Version: 1.01.4-rc-2

comment:16 by Aymeric Augustin, 12 years ago

Description: modified (diff)
Type: BugNew feature
Version: 1.4-rc-21.0

If you look at the discussion, this isn't a bug; it's really a new feature.

The version field tracks the version the bug was reported in.

Related: #15201. Caching is hard.

Note: See TracTickets for help on using tickets.
Back to Top