Opened 7 years ago

Closed 7 years ago

#5898 closed (fixed)

HEAD processing should be compulsory

Reported by: Scott Barr <scott@…> Owned by: nobody
Component: HTTP handling Version: master
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Here is a patch to django.middleware.http.py that (1) add's a "Content-Length" header to the HTTP response if the "Content-Length" header hasn't already been supplied, and (2) removes the content from the response, for HEAD requests where the response code is 200.

This makes the HTTP response useful for HEAD requests, which shouldn't have content in the response. The Content-Length header is desirable in the response headers.

Attachments (5)

middleware.http.patch (938 bytes) - added by Scott Barr <scott@…> 7 years ago.
Patch
middleware.http.2.patch (940 bytes) - added by Scott Barr <scott@…> 7 years ago.
remove_unallowed_content.diff (2.1 KB) - added by arien <regexbot@…> 7 years ago.
RemoveUnallowedResponseContent as a response middleware that would automagically alway be run after all other reponse middleware.
http_requirements.diff (4.2 KB) - added by arien <regexbot@…> 7 years ago.
Try to ensure responses meet basic HTTP requirements.
middleware-docs.diff (637 bytes) - added by arien <regexbot@…> 7 years ago.

Download all attachments as: .zip

Change History (21)

Changed 7 years ago by Scott Barr <scott@…>

Patch

comment:1 Changed 7 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Both this ticket and #5897 duplicate what the conditional get middleware already does. There's no real disadvantage to including that middleware, so it might not be worth moving this functionality around, due to the disruption it will cause to existing setups.

Could probably be handled with a documentation improvement instead.

comment:2 Changed 7 years ago by Scott Barr <scott@…>

I did consider the duplicate functionality when submitting this patch, and it felt a bit wrong, so fair comment.

The latest patch addresses the removal of the duplicated functionality, and adds some extra documentation.

Changed 7 years ago by Scott Barr <scott@…>

comment:3 Changed 7 years ago by mtredinnick

The latest patch still adds nothing new, nor does it add anything to the public documentation. The ConditionalGetMiddleware already handles HEAD requests. Have a look at line 32 of http.py.

comment:4 Changed 7 years ago by Scott Barr <scott@…>

  • Resolution set to invalid
  • Status changed from new to closed

Apologies, I didn't see that in there. Marking this ticket as invalid.

comment:5 Changed 7 years ago by arien <regexbot@…>

  • Resolution invalid deleted
  • Status changed from closed to reopened

I think the point is that the conditional get middleware should not do this.

HEAD requests must not return content the same way that Location headers must use absolute URIs, which is what django.core.handlers.base.fix_location_header tries to ensure. When I tried to come up with a better patch for this issue, I found out fix_location_header fails, because fix_location_header fails is called (in the handler's get_response) before the response middleware is run.

The problem here is similar: this code should run after all response middleware, but there's not a really good place for it to hook into. If there were such a place both these issues could be easily resolved.

Since, there is no such hook, I'll attach a patch to show what this would mean for this responses to HEAD requests, written as a response middleware that would always run after all other response middleware.

Changed 7 years ago by arien <regexbot@…>

RemoveUnallowedResponseContent as a response middleware that would automagically alway be run after all other reponse middleware.

Changed 7 years ago by arien <regexbot@…>

Try to ensure responses meet basic HTTP requirements.

comment:6 Changed 7 years ago by arien <regexbot@…>

This new patch changes the handlers so that they run two fix-ups (for absolute URIs in redirects and removing content from responses to HEAD requests or with status codes 1xx, 204, and 304) after any response middleware. These fix-ups will not run if converting the incoming request to the handler's request_class fails; doing this would IMHO needlessly complicate the code.

This change is backwards compatible.

Although I think the idea is sound, I'm not submitting patches for docs or tests because there may be a nicer way to implement this.

comment:7 Changed 7 years ago by mtredinnick

What is the actual problem you are trying to fix, arien? (Let's set aside co-opting an existing ticket for a different purpose). HEAD requests already don't return any content. So there's no problem there. Don't make hand-wavy analogies to absolute URLs, they have nothing to do with HEAD requests. What is an example of something that you think should work that does not work with the current code?

We already set content length to 0 for 304 response codes and Django itself never returns a 204: if somebody does return a response like that, it's up to them to also not return any content, we aren't going to do introspection on a response that is generated by the user.

comment:8 Changed 7 years ago by arien <regexbot@…>

The problem I was going to fix is that HEAD request do return content whenever the conditional get middleware isn't used. Using the conditional get middleware is optional (and not even the default), not returning a message body on HEAD requests is mandatory.

In doing this, I found that there was another piece of "look into the user generated response and make sure we're HTTP compliant in this respect" code in there called fix_location_header. The problem was that fix_location_header was called before response middleware, while this new piece of code obviously needed to go after. So I submitted a half-hearted attempt at a patch (remove_unallowed_content.diff) that deferred any design decision, waffling about automagical middlewares.

Later, I decided that the correct fix would be for both fix_location_headers and this new piece of code to hook into the request processing after the response middleware. (The way fix_location_header was used didn't correspond to its stated reason for existence anyway, and making this one change would solve that as well.) There might even be more fix-ups in the future if/when Django gets more ambitious about this sort of thing. Hence http_requirements.diff.

(The mischaracterizations are getting rather tiresome, mtredinnick.)

comment:9 Changed 7 years ago by Scott Barr <scott@…>

Thats the issue I attempted to address with this ticket. HEAD should not return a content body, and should have a Content-Length header showing the size of the body that would have been returned.

comment:10 follow-up: Changed 7 years ago by mtredinnick

  • Summary changed from HeadMiddleware - Modify response for HEAD requests to HEAD processing should be compulsory
  • Triage Stage changed from Unreviewed to Accepted

Changed title to reflect what is probably the real issue here.

Sorry if you feel I'm mischaracterising what you're saying, arien. That's obviously not my intention, but we do need to establish what the real problem is, now that it's veered away from Scott's original wish to add functionality we already have. It is true that HEAD handling really does have nothing to with with fix_location_header (which is being used as it is intended and documented: it fixed the Location header after everything else is done).

I agree that HEAD handling should probably be compulsory since people aren't in the habit of throwing an error on methods they don't understand. We can certainly fix that. If you have an example of something else that doesn't work that you believe should, again, please post it. It's difficult to ensure that I've understood all the points you are trying to raise. As I understand the code at the moment, and thinking through the possibilities, if you have the ConditionalGetMiddleware installed, there should be no HTTP violations occurring. Do you know of a case where that isn't true? If not, we'll make the HEAD stuff happen by default and that should fix the main problem.

comment:11 Changed 7 years ago by mtredinnick

Aah... now I see what you're trying to do with the patches. These two patches are meant to go together and neither works in the absence of the other. Please, please, please, use one patch per change in the future. Attempts to confuse the people reviewing the patches wors far more frequently than you might intend and there's no benefit (and quite a downside) to multiple patches that are co-dependent.

Now I have enough information to fix this.

comment:12 Changed 7 years ago by arien <regexbot@…>

mtredinnick, the patches are independent.

What I'm aiming for is in the last patch. It makes responses to HEAD requests correct independent of the middlewares in use. It goes beyond this in that it also removes content for responses that can't have any content (because of their status code), (re)setting the Content-Length header to match.

The patch also created one single point for response fix-ups and moves the call to fix_location_header there, because both it fix_location_header and remove_unallowed_response_content make sure Django always plays by some basic HTTP rules in what it emits.

comment:13 in reply to: ↑ 10 Changed 7 years ago by arien <regexbot@…>

Replying to mtredinnick:

The functionality is there, but is should be compulsory; you shouldn't have to turn on ConditionalGetMiddleware for that.

The reason I mention fix_location_header isn't because it has to do with handling of HEAD requests (it doesn't), it's because fix_location_header could also be in some optional middleware, yet it isn't. Why is that? The same reasons could be given for this issue IMHO.

The concrete bugs, if that helps:

  1. HEAD requests can return content unless ConditionalGetMiddleware is installed and is run after all other middleware.
  1. Response middleware can still happily generate invalid Location headers, because fix_location_header runs too early to catch that.

The last patch fixes both issues and also makes sure that responses with status codes 1xx, 204 and 304 will never return content. (Well, when conversion of the "raw" request to the handler's request_class fails you are stuck with a "raw" request, so there I punted.)

comment:14 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [6662]) Fixed #5898 -- Changed a few response processing paths to make things harder to get wrong and easier to get right. Previous behaviour wasn't buggy, but it was harder to use than necessary.

We now have automatic HEAD processing always (previously required ConditionalGetMiddleware), middleware benefits from the Location header rewrite, so they can use relative URLs as well, and responses with response codes 1xx, 204 or 304 will always have their content removed, in accordance with the HTTP spec (so it's much harder to indavertently deliver invalid responses).

Based on a patch and diagnosis from regexbot@….

Changed 7 years ago by arien <regexbot@…>

comment:15 Changed 7 years ago by arien <regexbot@…>

  • Needs documentation set
  • Resolution fixed deleted
  • Status changed from closed to reopened

Thank you for checking this in and writing the tests, mtredinnick. Could you also patch the middleware docs to remove the comment about ConditionalGetMiddleware's handling of HEAD requests? (I'll reopen so this doesn't get lost.)

comment:16 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [6678]) Fixed #5898 -- Updated docs for r6662, as pointed out by arien.

Note: See TracTickets for help on using tickets.
Back to Top