Opened 13 years ago
Last modified 8 years ago
#17834 new Bug
ETag generated from empty content can break http caching
Reported by: | Paul Egan | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | rene.puls@…, k@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The [source:/django/trunk/django/utils/cache.py?rev=17286#L100 patch_response_headers] function will set an ETag header using an md5 hash of the response content. Many responses like redirects can have an empty body and this results in the same ETag for each such response. The response headers may vary but an intermediate cache can serve an incorrect response because the ETag is not unique.
A simple solution is to not set ETag if the content is empty - see attached patch.
>>> from django.http import HttpResponseRedirect >>> from django.utils.cache import patch_response_headers >>> r1 = HttpResponseRedirect('/u1') >>> patch_response_headers(r1) >>> r2 = HttpResponseRedirect('/u2') >>> patch_response_headers(r2) >>> r1['ETag'] == r2['ETag'] True
Attachments (1)
Change History (14)
comment:1 by , 13 years ago
Has patch: | set |
---|---|
Needs tests: | set |
Triage Stage: | Unreviewed → Accepted |
by , 13 years ago
Attachment: | etag.patch added |
---|
comment:2 by , 13 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I digged a little more in the code. In Django code, patch_response_headers is never called for responses which code is not in the 200 range. If you want to call this function from your own code, I think it's your responsability to not call it on redirect responses.
comment:3 by , 13 years ago
True, internally it's currently used only in UpdateCacheMiddleware, but even there a response with status 200 can still have an empty body.
comment:4 by , 13 years ago
Yes, but being empty or not, two identical responses get the same ETag. Is it a problem for you?
follow-up: 6 comment:5 by , 13 years ago
The problem is that the responses are not necessarily identical. The response headers are often as important as the content itself. The http spec is clear about this - http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html - and goes on to say "in order to be legal, a strong entity tag MUST change whenever the associated entity value changes in any way".
Ideally Django ought to compute the ETag from the entity body and entity headers but in practice the body alone is usually sufficient for uniquely identifying the entity. Of course that's not the case when the body is empty. If explicitly coding an ETag for such a response, the values from the appropriate entity headers would be included when generating the tag. For a function like patch_response_headers, where the addition of the ETag header is implicit, I think it is simpler & safer to not compute a tag when the body is empty (and the headers dominate).
I should note that this ticket isn't simple nit-picking but based on a real-world issue found in a production environment.
comment:6 by , 13 years ago
I guess that generating the ETag based on the content will match 98% of use cases. Now for the 2% remaining, you should be able to take advantage of view decorators to customize the generation of ETags (https://docs.djangoproject.com/en/dev/topics/conditional-view-processing/). Did you try to use them?
Replying to paulegan:
I should note that this ticket isn't simple nit-picking but based on a real-world issue found in a production environment.
Sure, I don't blame you for discussing it :-) But as the use case of the HttpResponseRedirect in the description is not entirely valid IMHO, maybe you could provide us with more details about your real use case.
comment:7 by , 13 years ago
The specific issue I came across was with a redirect response and wasn't using the cache middleware. I don't entirely agree that this immediately invalidates the bug however since many projects use patch_response_headers directly (a google search throws up quite a few examples and there are sure to be many more private projects, as in my case).
Stepping aside that argument, my main point is that a user of patch_response_headers may unexpectedly create responses that don't meet the http requirements and run into caching issues. It's certainly an unusual case that probably won't affect too many projects and can definitely be worked-around with other methods (an explicit ETag or the conditional decorators). However when the entity body is empty it seems to me to be safer to avoid the potential for ETag collision.
I guess it comes down to a balance between those users of patch_response_headers who expect an ETag even on empty responses and those who expect the function to do the right thing (not break caching). :-)
comment:8 by , 13 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Triage Stage: | Accepted → Design decision needed |
At this point, I see 3 ways to solve this:
- Consider it is a corner case, and let the programmers handle patch_response_headers carefully for responses without content.
- Change ETag computing to always include headers (generates digest on str(response) instead of str(response.content)).
- Special case ETag computing to use headers only if response.content is empty (OP proposal).
Note also that I have refactored ETag computing in a patch on #14722. Marking as DDN to get a core committer opinion.
comment:9 by , 12 years ago
Cc: | added |
---|
I just stumbled over the same problem and would like to add another possible solution.
The RFC says this about redirect responses (e.g. section 10.3.2): "Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s)." While this is not a requirement, it would help in this scenario, because the response body (and thus the ETag) would then depend on the redirect URL.
comment:10 by , 12 years ago
Status: | reopened → new |
---|
comment:11 by , 12 years ago
Triage Stage: | Design decision needed → Accepted |
---|
It seems to me that the headers could be included in the Etag calculation without any harm.
comment:14 by , 8 years ago
Cc: | added |
---|
For reference, here's what the updated RFC 7232 specification has to say on the matter:
A "strong validator" [like the ETags we compute] is representation metadata that changes value whenever a change occurs to the representation data that would be observable in the payload body of a 200 (OK) response to GET. A strong validator might change for reasons other than a change to the representation data, such as when a semantically significant part of the representation metadata is changed (e.g., Content-Type), but it is in the best interests of the origin server to only change the value when it is necessary to invalidate the stored responses held by remote caches and authoring tools.
So, the specification does not require the ETag
to change unless the response body itself changes. It could change, if we know that a change to the headers invalidates the response. Since the framework can't, in general, know that, and since the specification cautions against invalidating the response when it's unnecessary, my opinion is that the status quo of basing the ETag
only on the response body is fine.
(If we were to try and take the headers into account, though, it's worth noting which headers they should be. RFC 7231 defines the representation metadata as the Content-Type
, Content-Encoding
, Content-Language
, and Content-Location
headers.)
Patch to not set etag if response content is empty - with test