Opened 4 years ago

Last modified 6 weeks ago

#19705 assigned Bug

CommonMiddleware handles If-None-Match incorrectly

Reported by: Aymeric Augustin Owned by: Denis Cornehl
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: hirokiky@…, real.human@…, k@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Aymeric Augustin)

Two middleware check ETags for unmodified responses: CommonMiddleware and ConditionalGetMiddleware and they do it inconsistently.

If the response's ETag matches the request's If-None-Match:

  • ConditionalGetMiddleware changes the response code to 304, preserving all headers; the content gets removed later on
  • CommonMiddleware creates a new HttpResponseNotModified without content and simply restores the cookies.

As a consequence, CommonMiddleware returns a response without ETag, which is wrong. I detected this with RedBot on a Django site I run. Any site with USE_ETAGS = True has this problem.

In general, wiping headers sounds like a bad idea. A 304 is supposed to have the same headers as the 200. (Well, the RFC is more complicated, but I think it's the general idea. Future versions of HTTP will likely require the Content-Length not to be 0.)

I believe that CommonMiddleware should simply generate the ETag and not handle conditional content removal; that's the job of ConditionalGetMiddleware.

For example, if one is using GzipMiddleware, the correct response chain is:

  • CommonMiddleware computes the ETag,
  • GzipMiddleware compresses the content and modifies the ETag,
  • ConditionalGetMiddleware uses the modified ETag to decide if the response was modified or not.

This is a good reason to keep "ETag generation" and "Etag checking" concerns separate. The same argument applies to any middleware that sets or modifies ETags.

Unfortunately, CommonMiddleware is documented to "take care of sending Not Modified responses, if appropriate", so this would be a backwards incompatible change.

Change History (15)

comment:1 Changed 4 years ago by Aymeric Augustin

Description: modified (diff)
Summary: CommonMiddleware handles If-Modified-Since incorrectlyCommonMiddleware handles If-None-Match incorrectly
Triage Stage: UnreviewedAccepted

comment:2 Changed 4 years ago by Aymeric Augustin

The discussion below was closed as a duplicate of this ticket.

comment:3 Changed 4 years ago by Hiroki Kiyohara

Cc: hirokiky@… added
Has patch: set
Owner: changed from nobody to Hiroki Kiyohara
Status: newassigned

I send a pull-request to solve this problem

comment:4 Changed 4 years ago by Tai Lee

This still leaves a dependency on the order of middleware for correct operation. I don't think Etags should be set or checked in middleware at all. We already have a USE_ETAGS setting. Why not just return the HttpResponseNotModified after all middleware has executed, when USE_ETAGS is True?

We don't need middleware to create Etags, either. We could hook into content and header assignment for HttpResponse, so that Etags will always be set or updated whenever content or headers change, when USE_ETAGS is True.

Another benefit would be that we remove the repetitive requirement for ALL middleware that alters content or headers having to check the USE_ETAGS setting and then conditionally recalculate and update the Etag header. It's currently very easy for a middleware author to alter content without realising that they must do this, which means we could be serving responses with stale Etag headers.

comment:5 Changed 3 years ago by Tai Lee

Cc: real.human@… added

comment:6 Changed 3 years ago by Tim Graham

Patch needs improvement: set

Setting as "patch needs improvement" given the previous comment and the fact that the PR no longer merges cleanly.

comment:7 Changed 3 years ago by Hiroki Kiyohara

Owner: Hiroki Kiyohara deleted
Status: assignednew

I de-assign once time. because I won't try to fix this problem these days. sorry.

comment:8 Changed 18 months ago by Denis Cornehl

Since I stumbled on it while working with Django, I want to take care (if possible) of this issue.

What seems to be missing (apart from implementation) is this major design decision (of course there could be ways in between):

Way 1: Cleanup middlewares

  • as aaugustin proposed
  • change middleware behavior (backwards incompatible)
  • CommonMiddleware only calculates the ETag, if setting is set
  • GZipMiddleware changes it probably
  • ConditionalGetMiddleware takes care about the response code
  • perhaps move http.conditional_content_removal(), since it then removes the content based on the response code
  • document usage and needed order of the middlewares

Way 2: remove ETag handling in middlewares

  • as mrmachine proposed
  • also backwards incompatible
  • no ETag / Conditional Handling in middleware
  • ETags are always calculated in the request (when the setting is set)
  • conditional view processing is also moved, and always done (open on this, what about last-modified handling?)
  • perhaps easier to use for users, since it hooks into the response-content
  • open question would be, how the view decorators @etag and @condition would work then

comment:9 Changed 18 months ago by Denis Cornehl

Owner: set to Denis Cornehl
Status: newassigned

comment:10 Changed 18 months ago by Denis Cornehl

after talking to apollo13 and mjtamlyn we lean towards way 1, for a first draft.

I'll look into the old PR and improve / finish it up.

comment:11 Changed 18 months ago by Denis Cornehl

This will get funny.

There is also some duplication between:

  • CommonMiddleware
  • ConditionalGetMiddleware
  • the @condition decorator

also bugs fixed in only one of the places.

comment:12 Changed 18 months ago by Tai Lee

Was there any comment from those who prefer option 1 about all middleware that is ever written in the future having to check the value of the USE_ETAGS setting if the middleware alters content/headers, and recalculate the etag? This seems very likely to not happen in a lot of cases. Not all middleware authors will know or care about etags, and if they write a middleware class that alters content/headers it will become buggy when deployed by other developers who have enabled this setting.

I think that option 1 is just patching the symptom of coupled middleware and repetitive etag handling in Django's own middleware, but doesn't do much for the wider middleware ecosystem or preventing developers from making the same mistakes in their own code.

comment:13 Changed 6 months ago by Denis Cornehl

Just rediscovered this issue.

  • We refactored the conditional-get handling in 7a40fef17ab7918cbb1ddc3ba080f42b420f7a48. There we merged all the different implementations of conditional-get handling into one place.
  • there, based on the original code, a new response is generated as not-modified. Only cookies are kept if they are there.
  • I just started working on a new ticket that will deprecate / remove {{USE_ETAGS}} setting, and get the conditional-get handling out of {{CommonMiddleware}}. Then the {{ConditionalGetMiddleware}} will generate a {{ETag}} if there is none. By this middleware just being the last one, many of the edge-cases will be fixed.

Is there any knowledge on if we should keep all the headers in a 304 response? Before the refactor we had places that handled it differently.

At the moment we're returning a new 304 response all the time.

comment:14 Changed 3 months ago by Kevin Christopher Henry

Cc: k@… added

At the moment there are several interrelated problems with the way Django handles ETags. Briefly:

  • As stated in the original report above, we're not returning the correct headers with our 304s.
  • Gzipped responses are not getting the benefits of 304s since their ETags are not compared properly. See tickets #16035 and #26771.
  • Our processing of conditional requests does not follow the specification's rewritten precedence rules from 2014.
  • The condition(), vary*(), cache_control(), and gzip_page() decorators work properly in some undocumented orders but not in others.

Here are the changes I would suggest to address these problems:

  • Move ETag handling out of CommonMiddleware and into ConditionalGetMiddleware. This change has already been accepted (#26447) and partially implemented (PR 6393). (Though I'm less confident about the deprecation of USE_ETAGS in that ticket.)
  • Change GZipMiddleware to remove the ;gzip token from the incoming ETags in process_request(). That will allow comparisons to work properly. Document this.
  • Change the 304 headers to match the specification.
  • Change our conditional view logic to match the (simpler) specified precedence rules.
  • Change our use of GzipFile to specify a modification time of 0. That will make our gzip output dependent only on the response body, which means that we can usefully compare ETags on gzipped content, which means that we don't have to care about the order of the gzip_page() decorator.
  • Document that the condition() decorator should be below vary() and cache_control() so that those headers can be set properly.

In more detail:

mrmachine's comment raises a fair point. The fundamental issue here is that Django's architecture strives to be layered and decoupled, but ETags are by definition highly coupled to the response, so it's difficult to isolate them to one layer (whether that be middleware or a decorator).

That said, the specification only requires that the ETag change when the response body changes, and I don't think that's a very common middleware behavior. Only GZipMiddleware meets that definition among the middleware in core, for example. And if the body is changed, the ETag can be changed in a manner similar to what I'm suggesting for GZipMiddleware. I do think this needs to be documented, though.

Regarding which headers to send back with 304s, the specification says this:

The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request: Cache-Control, Content-Location, Date, ETag, Expires, and Vary. Since the goal of a 304 response is to minimize information transfer when the recipient already has one or more cached representations, a sender SHOULD NOT generate representation metadata other than the above listed fields unless said metadata exists for the purpose of guiding cache updates (e.g., Last-Modified might be useful if the response does not have an ETag field).

So I think we should return those headers from the response, along with the cookies (since that is what Django already does, and since it's apparently very common despite not being mandated by the standard).

I'm able to work on this, but the next step depends on the fate of PR 6393 , since that accomplishes some of the reorganization mentioned above.

comment:15 Changed 6 weeks ago by Kevin Christopher Henry

Another solution to the gzip problem (that is, that we never match a gzipped ETag because we modify it on the way out but not on the way in) would be to have GZipMiddleware change the ETag to a weak ETag. That would be simple, consistent with the specification, and well-precedented:

Note: See TracTickets for help on using tickets.
Back to Top