Opened 3 years ago

Last modified 2 years ago

#17834 new Bug

ETag generated from empty content can break http caching

Reported by: paulegan Owned by: nobody
Component: HTTP handling Version: 1.3
Severity: Normal Keywords:
Cc: rene.puls@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The [source:/django/trunk/django/utils/cache.py?rev=17286#L100 patch_response_headers] function will set an ETag header using an md5 hash of the response content. Many responses like redirects can have an empty body and this results in the same ETag for each such response. The response headers may vary but an intermediate cache can serve an incorrect response because the ETag is not unique.

A simple solution is to not set ETag if the content is empty - see attached patch.

>>> from django.http import HttpResponseRedirect
>>> from django.utils.cache import patch_response_headers
>>> r1 = HttpResponseRedirect('/u1')
>>> patch_response_headers(r1)
>>> r2 = HttpResponseRedirect('/u2')
>>> patch_response_headers(r2)
>>> r1['ETag'] == r2['ETag']
True

Attachments (1)

etag.patch (1.6 KB) - added by paulegan 3 years ago.
Patch to not set etag if response content is empty - with test

Download all attachments as: .zip

Change History (13)

comment:1 Changed 3 years ago by claudep

  • Has patch set
  • Needs documentation unset
  • Needs tests set
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Changed 3 years ago by paulegan

Patch to not set etag if response content is empty - with test

comment:2 Changed 3 years ago by claudep

  • Resolution set to wontfix
  • Status changed from new to closed

I digged a little more in the code. In Django code, patch_response_headers is never called for responses which code is not in the 200 range. If you want to call this function from your own code, I think it's your responsability to not call it on redirect responses.

comment:3 Changed 3 years ago by paulegan

True, internally it's currently used only in UpdateCacheMiddleware, but even there a response with status 200 can still have an empty body.

comment:4 Changed 3 years ago by claudep

Yes, but being empty or not, two identical responses get the same ETag. Is it a problem for you?

comment:5 follow-up: Changed 3 years ago by paulegan

The problem is that the responses are not necessarily identical. The response headers are often as important as the content itself. The http spec is clear about this - http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html - and goes on to say "in order to be legal, a strong entity tag MUST change whenever the associated entity value changes in any way".

Ideally Django ought to compute the ETag from the entity body and entity headers but in practice the body alone is usually sufficient for uniquely identifying the entity. Of course that's not the case when the body is empty. If explicitly coding an ETag for such a response, the values from the appropriate entity headers would be included when generating the tag. For a function like patch_response_headers, where the addition of the ETag header is implicit, I think it is simpler & safer to not compute a tag when the body is empty (and the headers dominate).

I should note that this ticket isn't simple nit-picking but based on a real-world issue found in a production environment.

comment:6 in reply to: ↑ 5 Changed 3 years ago by claudep

I guess that generating the ETag based on the content will match 98% of use cases. Now for the 2% remaining, you should be able to take advantage of view decorators to customize the generation of ETags (https://docs.djangoproject.com/en/dev/topics/conditional-view-processing/). Did you try to use them?

Replying to paulegan:

I should note that this ticket isn't simple nit-picking but based on a real-world issue found in a production environment.

Sure, I don't blame you for discussing it :-) But as the use case of the HttpResponseRedirect in the description is not entirely valid IMHO, maybe you could provide us with more details about your real use case.

comment:7 Changed 3 years ago by paulegan

The specific issue I came across was with a redirect response and wasn't using the cache middleware. I don't entirely agree that this immediately invalidates the bug however since many projects use patch_response_headers directly (a google search throws up quite a few examples and there are sure to be many more private projects, as in my case).

Stepping aside that argument, my main point is that a user of patch_response_headers may unexpectedly create responses that don't meet the http requirements and run into caching issues. It's certainly an unusual case that probably won't affect too many projects and can definitely be worked-around with other methods (an explicit ETag or the conditional decorators). However when the entity body is empty it seems to me to be safer to avoid the potential for ETag collision.

I guess it comes down to a balance between those users of patch_response_headers who expect an ETag even on empty responses and those who expect the function to do the right thing (not break caching). :-)

comment:8 Changed 3 years ago by claudep

  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Accepted to Design decision needed

At this point, I see 3 ways to solve this:

  1. Consider it is a corner case, and let the programmers handle patch_response_headers carefully for responses without content.
  2. Change ETag computing to always include headers (generates digest on str(response) instead of str(response.content)).
  3. Special case ETag computing to use headers only if response.content is empty (OP proposal).

Note also that I have refactored ETag computing in a patch on #14722. Marking as DDN to get a core committer opinion.

comment:9 Changed 3 years ago by rene.puls@…

  • Cc rene.puls@… added

I just stumbled over the same problem and would like to add another possible solution.

The RFC says this about redirect responses (e.g. section 10.3.2): "Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s)." While this is not a requirement, it would help in this scenario, because the response body (and thus the ETag) would then depend on the redirect URL.

comment:10 Changed 2 years ago by aaugustin

  • Status changed from reopened to new

comment:11 Changed 2 years ago by aaugustin

  • Triage Stage changed from Design decision needed to Accepted

It seems to me that the headers could be included in the Etag calculation without any harm.

comment:12 Changed 2 years ago by aaugustin

#12789 was a duplicate.

Note: See TracTickets for help on using tickets.
Back to Top