Code

Opened 3 years ago

Closed 17 months ago

#16035 closed Bug (fixed)

GZipMiddleware doesn't change an ETag

Reported by: anonymous Owned by: ext
Component: HTTP handling Version: 1.4
Severity: Normal Keywords:
Cc: pigletto Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

The GZipMiddleware does not alter an ETag, if one is present. This means that e.g. if USE_ETAGS is True and CommonMiddleware is used, or if UpdateCacheMiddleware is used, that the same ETag is returned for both gzipped and non-gzipped responses. This is against spec (RFC2616 section 13.3.3 says a strong validator should change if any entity header or the body changes - when gzipped, the body has changed and Content-Encoding is an entity header). This is the second pitfall listed here: http://www.mnot.net/blog/2007/08/07/etags

A patch to gzip.py would, I think, simply see if an ETag is set, and if so add ";gzip" to the end (inside the quote, of course) or similar.

Attachments (3)

gzip.py.patch (484 bytes) - added by anonymous 3 years ago.
Patch to gzip.py to update ETag upon gzipping
gzip.patch.tests.diff (3.5 KB) - added by dracos2 3 years ago.
Patch from right directory, and some tests
patch_for_16035.diff (2.8 KB) - added by ext 2 years ago.
Uses new way to override settings

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by anonymous

Patch to gzip.py to update ETag upon gzipping

comment:1 Changed 3 years ago by anonymous

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 3 years ago by aaugustin

  • Needs tests set
  • Triage Stage changed from Unreviewed to Accepted

Changed 3 years ago by dracos2

Patch from right directory, and some tests

comment:3 Changed 3 years ago by dracos2

(I submitted the bug not logged in, sorry.) I've attached the same patch run from the right directory with svn diff this time, and also added some tests of the GZipMiddleware - one that tests short things aren't gzipped, and one to test the ETag does indeed differ for gzipped and non-gzipped versions. Hope that's helpful.

comment:4 Changed 3 years ago by aaugustin

  • Needs tests unset
  • Patch needs improvement set

Your patch should use the new way to alter settings during tests: https://docs.djangoproject.com/en/dev/topics/testing/#overriding-settings

comment:5 Changed 3 years ago by anonymous

  • Easy pickings set
  • UI/UX unset

Changed 2 years ago by ext

Uses new way to override settings

comment:6 Changed 2 years ago by pigletto

  • Owner changed from nobody to ext
  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

comment:7 Changed 2 years ago by pigletto

  • Cc pigletto added

comment:8 Changed 2 years ago by jezdez

  • Resolution set to fixed
  • Status changed from new to closed

In [17471]:

Fixed #16035 -- Appended the Etag response header if the GZipMiddleware is in use to follow RFC2616 better. Thanks, ext and dracos2.

comment:9 Changed 2 years ago by anonymous

  • Resolution fixed deleted
  • Status changed from closed to reopened

The path does not work since the gzip middeware does strip the ;gzip etag in process_request. CommonMiddleware generates an etag for the response (this excludes the ";gzip") and compares this to HTTP_IF_NONE_MATCH (this includes the ";gzip").

comment:10 Changed 2 years ago by aaugustin

  • Triage Stage changed from Ready for checkin to Unreviewed

comment:11 Changed 2 years ago by jezdez

Feel free to provide a test case, I can't completely follow.

comment:12 Changed 2 years ago by anonymous

I request an endpoint with

Accept-Encoding: gzip 

I get the following headers back

Status Code: 200
Date: Tue, 22 May 2012 08:04:13 GMT
Content-Encoding: gzip
Content-Length: 192
Server: WSGIServer/0.1 Python/2.7.1
ETag: "ea57c4c505a7d588dd03d7251a33f846;gzip"
Vary: Accept-Encoding
Content-Type: application/json; charset=utf-8

Now, GETing the same endpoint again, this time passing the ETag

Accept-Encoding: gzip 
If-None-Match: "0dbb5568198250a9cb983ad6c201e02a;gzip"

still gives me back a 200

Status Code: 200
Date: Tue, 22 May 2012 08:04:13 GMT
Content-Encoding: gzip
Content-Length: 192
Server: WSGIServer/0.1 Python/2.7.1
ETag: "ea57c4c505a7d588dd03d7251a33f846;gzip"
Vary: Accept-Encoding
Content-Type: application/json; charset=utf-8

Passing the ETag without the ";gzip"-part, I get a 304

Accept-Encoding: gzip 
If-None-Match: "ea57c4c505a7d588dd03d7251a33f846"

response

Status Code: 304
Date: Tue, 22 May 2012 08:07:53 GMT
Content-Length: 0
Server: WSGIServer/0.1 Python/2.7.1
Content-Type: text/html; charset=utf-8

The problem is that the ";gzip" part is not stripped when processing a new request as mentioned above.

comment:13 Changed 2 years ago by aaugustin

  • Triage Stage changed from Unreviewed to Accepted

I didn't reproduce the sequence above myself, but it's a totally plausible bug.

comment:14 Changed 2 years ago by anonymous

  • Version changed from 1.3 to 1.4

This bug is in 1.4

comment:15 Changed 22 months ago by mrmachine

This is definitely a bug, and it depends on the position of GZipMiddleware relative to CommonMiddleware (and probably other middleware).

CommonMiddleware generates an etag based on response.content, and returns an HttpResponseNotModified when the If-None-Match header is the same as the generated etag.

GZipMiddleware alters an existing etag (adding ";gzip"), but it never checks If-None-Match and never returns an HttpResponseNotModified.

If GZipMiddleware comes before CommonMiddleware in the MIDDLEWARE_CLASSES setting (GZipMiddleware.process_response runs after CommonMiddleware.process_response), the browser will supply a ";gzip" etag for the If-None-Match header in subsequent requests, that ContentMiddleware will see as different to the etag that it generates, and will never return an HttpResponseNotModified.

If GZipMiddleware comes after CommonMiddleware, and the user hasn't explicitly set an etag on their response (which is usually the case -- people expect CommonMiddleware to do this for them), then things get crazy. You will likely get a different etag on every request, even though the content is the same.

The reason is because GZipMiddleware will not an etag when there is none. CommonMiddleware will then try to generate an etag for the compressed content. But all gzip streams must include a timestamp, and GzipFile will use the current timestamp if none is supplied. So the etag that CommonMiddleware generates will be different for the compressed version of the same content, if the requests are a few seconds apart.

We can resolve this side-issue by adding an arbitrary timestamp to GzipFile. The GzipFile docs say that the module ignores the timestamp when decompressing, but some programs such as gunzip make use of it. I'm not sure if this is the right thing to do here.

Back to the main issue. GZipMiddleware, and any other middleware that alters the etag header, should also generate an etag when there is none. And GZipMiddleware needs to do it *before* it compresses the content (unless we fudge the timestamp).

And generally speaking, if we're supposed to generate a different etag if content OR headers change, then shouldn't we be generating etags based on response.serialize(), not response.content?

But I'm not convinced that GZipMiddleware (or any middleware) should alter existing etags at all. If we allow any middleware to alter an etag (or encourage by example), that middleware must also set etags when there are none and compare the new or updated etag against the If-None-Match header and return HttpResponseNotModified when they match (including moving cookies from the old response to the HttpResponseNotModified). This is repetitive code (basically half of CommonMiddleware.process_response) that shouldn't need to be repeated.

Perhaps the creation of etags should be moved out of middleware and be taken care of in the response.content setter, and have it take into account the full HTTP message including headers? We would need to make sure that compresesd content is deterministic in this case (by fudging the timestamp), but middleware would no longer need to worry about altering or setting etags. If middleware (or view code) assigns to response.content, the etag would be updated (if etags are enabled).

This would leave the issue of when to return an HttpResponseNotModified. I would argue that this should happen after all middleware is applied, and should not happen inside any middleware classes. This way, the etag of the final response (after having passed through all response middleware) is the one that is compared to the If-None-Match header, and there is no need for repetitive code in middleware to make this comparison and return HttpResponseNotModified.

comment:16 Changed 22 months ago by mrmachine

I've added a branch on GitHub with a fix and tests.

https://github.com/thirstydigital/django/tree/tickets/16035-gzip-middleware-etag

The test has a 1 second sleep in it to demonstrate the mtime issue with compress_string(). I'm not sure if there's a better way to test that.

comment:17 Changed 17 months ago by aaugustin

  • Resolution set to fixed
  • Status changed from reopened to closed

This is a different problem, and it will be solved by #19705. For the sake of clarity let's continue the discussion in that ticket.

The comment and the patch above are too complicated and ignore ConditionalGetMiddleware. I'm -1 on triplicating ETag handling in gzip middleware.

I think the problem can be solved by removing some code and adding some docs.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.