Opened 13 years ago
Closed 12 years ago
#16035 closed Bug (fixed)
GZipMiddleware doesn't change an ETag
Reported by: | anonymous | Owned by: | Jakub Wiśniowski |
---|---|---|---|
Component: | HTTP handling | Version: | 1.4 |
Severity: | Normal | Keywords: | |
Cc: | Maciej Wiśniowski | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
The GZipMiddleware does not alter an ETag, if one is present. This means that e.g. if USE_ETAGS is True and CommonMiddleware is used, or if UpdateCacheMiddleware is used, that the same ETag is returned for both gzipped and non-gzipped responses. This is against spec (RFC2616 section 13.3.3 says a strong validator should change if any entity header or the body changes - when gzipped, the body has changed and Content-Encoding is an entity header). This is the second pitfall listed here: http://www.mnot.net/blog/2007/08/07/etags
A patch to gzip.py would, I think, simply see if an ETag is set, and if so add ";gzip" to the end (inside the quote, of course) or similar.
Attachments (3)
Change History (20)
by , 13 years ago
Attachment: | gzip.py.patch added |
---|
comment:1 by , 13 years ago
Has patch: | set |
---|
comment:2 by , 13 years ago
Needs tests: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
by , 13 years ago
Attachment: | gzip.patch.tests.diff added |
---|
Patch from right directory, and some tests
comment:3 by , 13 years ago
(I submitted the bug not logged in, sorry.) I've attached the same patch run from the right directory with svn diff this time, and also added some tests of the GZipMiddleware - one that tests short things aren't gzipped, and one to test the ETag does indeed differ for gzipped and non-gzipped versions. Hope that's helpful.
comment:4 by , 13 years ago
Needs tests: | unset |
---|---|
Patch needs improvement: | set |
Your patch should use the new way to alter settings during tests: https://docs.djangoproject.com/en/dev/topics/testing/#overriding-settings
comment:5 by , 13 years ago
Easy pickings: | set |
---|---|
UI/UX: | unset |
comment:6 by , 13 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Triage Stage: | Accepted → Ready for checkin |
comment:7 by , 13 years ago
Cc: | added |
---|
comment:9 by , 13 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
The path does not work since the gzip middeware does strip the ;gzip etag in process_request. CommonMiddleware generates an etag for the response (this excludes the ";gzip") and compares this to HTTP_IF_NONE_MATCH (this includes the ";gzip").
comment:10 by , 12 years ago
Triage Stage: | Ready for checkin → Unreviewed |
---|
comment:12 by , 12 years ago
I request an endpoint with
Accept-Encoding: gzip
I get the following headers back
Status Code: 200 Date: Tue, 22 May 2012 08:04:13 GMT Content-Encoding: gzip Content-Length: 192 Server: WSGIServer/0.1 Python/2.7.1 ETag: "ea57c4c505a7d588dd03d7251a33f846;gzip" Vary: Accept-Encoding Content-Type: application/json; charset=utf-8
Now, GETing the same endpoint again, this time passing the ETag
Accept-Encoding: gzip If-None-Match: "0dbb5568198250a9cb983ad6c201e02a;gzip"
still gives me back a 200
Status Code: 200 Date: Tue, 22 May 2012 08:04:13 GMT Content-Encoding: gzip Content-Length: 192 Server: WSGIServer/0.1 Python/2.7.1 ETag: "ea57c4c505a7d588dd03d7251a33f846;gzip" Vary: Accept-Encoding Content-Type: application/json; charset=utf-8
Passing the ETag without the ";gzip"-part, I get a 304
Accept-Encoding: gzip If-None-Match: "ea57c4c505a7d588dd03d7251a33f846"
response
Status Code: 304 Date: Tue, 22 May 2012 08:07:53 GMT Content-Length: 0 Server: WSGIServer/0.1 Python/2.7.1 Content-Type: text/html; charset=utf-8
The problem is that the ";gzip" part is not stripped when processing a new request as mentioned above.
comment:13 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
I didn't reproduce the sequence above myself, but it's a totally plausible bug.
comment:15 by , 12 years ago
This is definitely a bug, and it depends on the position of GZipMiddleware
relative to CommonMiddleware
(and probably other middleware).
CommonMiddleware
generates an etag based on response.content
, and returns an HttpResponseNotModified
when the If-None-Match
header is the same as the generated etag.
GZipMiddleware
alters an existing etag (adding ";gzip"), but it never checks If-None-Match
and never returns an HttpResponseNotModified
.
If GZipMiddleware
comes before CommonMiddleware
in the MIDDLEWARE_CLASSES
setting (GZipMiddleware.process_response
runs after CommonMiddleware.process_response
), the browser will supply a ";gzip" etag for the If-None-Match
header in subsequent requests, that ContentMiddleware
will see as different to the etag that it generates, and will never return an HttpResponseNotModified
.
If GZipMiddleware
comes after CommonMiddleware
, and the user hasn't explicitly set an etag on their response (which is usually the case -- people expect CommonMiddleware
to do this for them), then things get crazy. You will likely get a different etag on every request, even though the content is the same.
The reason is because GZipMiddleware
will not an etag when there is none. CommonMiddleware
will then try to generate an etag for the compressed content. But all gzip streams must include a timestamp, and GzipFile
will use the current timestamp if none is supplied. So the etag that CommonMiddleware
generates will be different for the compressed version of the same content, if the requests are a few seconds apart.
We can resolve this side-issue by adding an arbitrary timestamp to GzipFile
. The GzipFile
docs say that the module ignores the timestamp when decompressing, but some programs such as gunzip
make use of it. I'm not sure if this is the right thing to do here.
Back to the main issue. GZipMiddleware
, and any other middleware that alters the etag header, should also generate an etag when there is none. And GZipMiddleware
needs to do it *before* it compresses the content (unless we fudge the timestamp).
And generally speaking, if we're supposed to generate a different etag if content OR headers change, then shouldn't we be generating etags based on response.serialize()
, not response.content
?
But I'm not convinced that GZipMiddleware
(or any middleware) should alter existing etags at all. If we allow any middleware to alter an etag (or encourage by example), that middleware must also set etags when there are none and compare the new or updated etag against the If-None-Match
header and return HttpResponseNotModified
when they match (including moving cookies from the old response to the HttpResponseNotModified
). This is repetitive code (basically half of CommonMiddleware.process_response
) that shouldn't need to be repeated.
Perhaps the creation of etags should be moved out of middleware and be taken care of in the response.content
setter, and have it take into account the full HTTP message including headers? We would need to make sure that compresesd content is deterministic in this case (by fudging the timestamp), but middleware would no longer need to worry about altering or setting etags. If middleware (or view code) assigns to response.content
, the etag would be updated (if etags are enabled).
This would leave the issue of when to return an HttpResponseNotModified
. I would argue that this should happen after all middleware is applied, and should not happen inside any middleware classes. This way, the etag of the final response (after having passed through all response middleware) is the one that is compared to the If-None-Match
header, and there is no need for repetitive code in middleware to make this comparison and return HttpResponseNotModified
.
comment:16 by , 12 years ago
I've added a branch on GitHub with a fix and tests.
https://github.com/thirstydigital/django/tree/tickets/16035-gzip-middleware-etag
The test has a 1 second sleep in it to demonstrate the mtime
issue with compress_string()
. I'm not sure if there's a better way to test that.
comment:17 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
This is a different problem, and it will be solved by #19705. For the sake of clarity let's continue the discussion in that ticket.
The comment and the patch above are too complicated and ignore ConditionalGetMiddleware
. I'm -1 on triplicating ETag handling in gzip middleware.
I think the problem can be solved by removing some code and adding some docs.
Patch to gzip.py to update ETag upon gzipping