Opened 12 years ago

Last modified 7 years ago

#17834 new Bug

ETag generated from empty content can break http caching

Reported by: Paul Egan Owned by: nobody
Component: HTTP handling Version: 1.3
Severity: Normal Keywords:
Cc: rene.puls@…, k@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The [source:/django/trunk/django/utils/cache.py?rev=17286#L100 patch_response_headers] function will set an ETag header using an md5 hash of the response content. Many responses like redirects can have an empty body and this results in the same ETag for each such response. The response headers may vary but an intermediate cache can serve an incorrect response because the ETag is not unique.

A simple solution is to not set ETag if the content is empty - see attached patch.

>>> from django.http import HttpResponseRedirect
>>> from django.utils.cache import patch_response_headers
>>> r1 = HttpResponseRedirect('/u1')
>>> patch_response_headers(r1)
>>> r2 = HttpResponseRedirect('/u2')
>>> patch_response_headers(r2)
>>> r1['ETag'] == r2['ETag']
True

Attachments (1)

etag.patch (1.6 KB ) - added by Paul Egan 12 years ago.
Patch to not set etag if response content is empty - with test

Download all attachments as: .zip

Change History (14)

comment:1 by Claude Paroz, 12 years ago

Has patch: set
Needs tests: set
Triage Stage: UnreviewedAccepted

by Paul Egan, 12 years ago

Attachment: etag.patch added

Patch to not set etag if response content is empty - with test

comment:2 by Claude Paroz, 12 years ago

Resolution: wontfix
Status: newclosed

I digged a little more in the code. In Django code, patch_response_headers is never called for responses which code is not in the 200 range. If you want to call this function from your own code, I think it's your responsability to not call it on redirect responses.

comment:3 by Paul Egan, 12 years ago

True, internally it's currently used only in UpdateCacheMiddleware, but even there a response with status 200 can still have an empty body.

comment:4 by Claude Paroz, 12 years ago

Yes, but being empty or not, two identical responses get the same ETag. Is it a problem for you?

comment:5 by Paul Egan, 12 years ago

The problem is that the responses are not necessarily identical. The response headers are often as important as the content itself. The http spec is clear about this - http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html - and goes on to say "in order to be legal, a strong entity tag MUST change whenever the associated entity value changes in any way".

Ideally Django ought to compute the ETag from the entity body and entity headers but in practice the body alone is usually sufficient for uniquely identifying the entity. Of course that's not the case when the body is empty. If explicitly coding an ETag for such a response, the values from the appropriate entity headers would be included when generating the tag. For a function like patch_response_headers, where the addition of the ETag header is implicit, I think it is simpler & safer to not compute a tag when the body is empty (and the headers dominate).

I should note that this ticket isn't simple nit-picking but based on a real-world issue found in a production environment.

in reply to:  5 comment:6 by Claude Paroz, 12 years ago

I guess that generating the ETag based on the content will match 98% of use cases. Now for the 2% remaining, you should be able to take advantage of view decorators to customize the generation of ETags (https://docs.djangoproject.com/en/dev/topics/conditional-view-processing/). Did you try to use them?

Replying to paulegan:

I should note that this ticket isn't simple nit-picking but based on a real-world issue found in a production environment.

Sure, I don't blame you for discussing it :-) But as the use case of the HttpResponseRedirect in the description is not entirely valid IMHO, maybe you could provide us with more details about your real use case.

comment:7 by Paul Egan, 12 years ago

The specific issue I came across was with a redirect response and wasn't using the cache middleware. I don't entirely agree that this immediately invalidates the bug however since many projects use patch_response_headers directly (a google search throws up quite a few examples and there are sure to be many more private projects, as in my case).

Stepping aside that argument, my main point is that a user of patch_response_headers may unexpectedly create responses that don't meet the http requirements and run into caching issues. It's certainly an unusual case that probably won't affect too many projects and can definitely be worked-around with other methods (an explicit ETag or the conditional decorators). However when the entity body is empty it seems to me to be safer to avoid the potential for ETag collision.

I guess it comes down to a balance between those users of patch_response_headers who expect an ETag even on empty responses and those who expect the function to do the right thing (not break caching). :-)

comment:8 by Claude Paroz, 12 years ago

Resolution: wontfix
Status: closedreopened
Triage Stage: AcceptedDesign decision needed

At this point, I see 3 ways to solve this:

  1. Consider it is a corner case, and let the programmers handle patch_response_headers carefully for responses without content.
  2. Change ETag computing to always include headers (generates digest on str(response) instead of str(response.content)).
  3. Special case ETag computing to use headers only if response.content is empty (OP proposal).

Note also that I have refactored ETag computing in a patch on #14722. Marking as DDN to get a core committer opinion.

comment:9 by rene.puls@…, 12 years ago

Cc: rene.puls@… added

I just stumbled over the same problem and would like to add another possible solution.

The RFC says this about redirect responses (e.g. section 10.3.2): "Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s)." While this is not a requirement, it would help in this scenario, because the response body (and thus the ETag) would then depend on the redirect URL.

comment:10 by Aymeric Augustin, 11 years ago

Status: reopenednew

comment:11 by Aymeric Augustin, 11 years ago

Triage Stage: Design decision neededAccepted

It seems to me that the headers could be included in the Etag calculation without any harm.

comment:12 by Aymeric Augustin, 11 years ago

#12789 was a duplicate.

comment:14 by Kevin Christopher Henry, 7 years ago

Cc: k@… added

For reference, here's what the updated RFC 7232 specification has to say on the matter:

A "strong validator" [like the ETags we compute] is representation metadata that changes value whenever a change occurs to the representation data that would be observable in the payload body of a 200 (OK) response to GET. A strong validator might change for reasons other than a change to the representation data, such as when a semantically significant part of the representation metadata is changed (e.g., Content-Type), but it is in the best interests of the origin server to only change the value when it is necessary to invalidate the stored responses held by remote caches and authoring tools.

So, the specification does not require the ETag to change unless the response body itself changes. It could change, if we know that a change to the headers invalidates the response. Since the framework can't, in general, know that, and since the specification cautions against invalidating the response when it's unnecessary, my opinion is that the status quo of basing the ETag only on the response body is fine.

(If we were to try and take the headers into account, though, it's worth noting which headers they should be. RFC 7231 defines the representation metadata as the Content-Type, Content-Encoding, Content-Language, and Content-Location headers.)

Note: See TracTickets for help on using tickets.
Back to Top