Opened 17 years ago
Closed 12 years ago
#5611 closed Cleanup/optimization (fixed)
HTTPRequest should check content-type before parsing post body
Reported by: | Paul Egan | Owned by: | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | jesse.lovelace@…, slacy@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The modpython handler currently tries to parse the request body as application/x-www-form-urlencoded
regardless of the actual content type. This leads to an exception being thrown if the content is not parseable. See attached patch for suggested strict checking of request content-type.
Example
echo -e '\0' | curl -v --data-binary @- http://localhost/
Throws:
Traceback (most recent call last): File "/usr/lib64/python2.5/site-packages/mod_python/importer.py", line 1537, in HandlerDispatch default=default_handler, arg=req, silent=hlist.silent) File "/usr/lib64/python2.5/site-packages/mod_python/importer.py", line 1229, in _process_target result = _execute_target(config, req, object, arg) File "/usr/lib64/python2.5/site-packages/mod_python/importer.py", line 1128, in _execute_target result = object(arg) File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 181, in handler return ModPythonHandler()(req) File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 154, in __call__ response = self.get_response(request) File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 53, in get_response response = self._real_get_response(request) File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 63, in _real_get_response response = middleware_method(request) File "/home/paul/work/site/trunk/bydesign/middleware/hotshot_profiler.py", line 13, in process_request if settings.DEBUG and request.has_key('prof'): File "/usr/lib/python2.5/site-packages/django/http/__init__.py", line 43, in has_key return key in self.GET or key in self.POST File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 73, in _get_post self._load_post_and_files() File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 56, in _load_post_and_files self._post, self._files = http.QueryDict(self.raw_post_data, encoding=self._encoding), datastructures.MultiValueDict() File "/usr/lib/python2.5/site-packages/django/http/__init__.py", line 148, in __init__ for key, value in parse_qsl((query_string or ''), True): # keep_blank_values=True TypeError: argument 1 must be string without null bytes, not str
Attachments (6)
Change History (30)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Hm, is it fixed on a branch? On the trunk (r6844) the QueryDict
constructor still throws an exception. A view which expects non url-encoded data will most likely use request.raw_post_data
but any intervening code (e.g. middleware) that uses request.POST
, request.REQUEST
, request.has_key()
, etc, will trigger this exception.
Just as it makes sense to check if the request content type is multipart
for "file uploads", it seems sensible to check that the type is application/x-www-form-urlencoded
before parsing the data as such. The only reason I can think of against such a check would be if there are user-agents out there which don't correctly set the content type for forms - but I don't believe that's the case.
comment:3 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I can't reproduce on trunk [7168]:
[1] >>> from django.http import QueryDict [2] >>> QueryDict("\0") [2] : <QueryDict: {u'\x00': [u'']}>
As you can see you get a nonsense querydict, but you don't get an exception.
comment:4 by , 17 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
As per ticket title & patch, this is an issue when running django with mod_python. django.http first tries to load mod_python's C implementation of parse_qsl (source:django/trunk/django/http/__init__.py@7168#L13), this will fail outside the mod_python context and so the test above is not valid.
Even if an exception wasn't thrown, I can't think of any good reason for an objection to checking the content type before parsing. Not only is it the right thing to do, it is what other frameworks do. For example:
comment:5 by , 17 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Ah, I'd forgotten that we use Apache's parse_qsl. I wonder how much faster that really is... seems to confuse things a bit.
comment:6 by , 16 years ago
Cc: | added |
---|
I can verify this bug in r8921--wrapping parse_qsl in a try...except is a simple workaround we are using.
comment:7 by , 16 years ago
Wrapping the parser in a try block hides the error, but it is still terribly inefficient to attempt to parse potentially megabytes of data (perhaps POSTing image/jpeg
) when you know beforehand from the content type that it is not application/x-www-form-urlencoded
.
by , 16 years ago
Attachment: | modpython.diff added |
---|
Patch to check content-type before parsing post body - updated to r9747
comment:9 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Cleanup/optimization |
comment:10 by , 14 years ago
Easy pickings: | unset |
---|---|
Patch needs improvement: | set |
modpython.diff fails to apply cleanly on to trunk
comment:12 by , 13 years ago
Keywords: | modpython removed |
---|---|
Needs tests: | set |
Patch needs improvement: | unset |
Summary: | modpython handler should check content-type before parsing post body → HTTPRequest should check content-type before parsing post body |
No special opinion on the patch, just cleaned patch and description.
comment:13 by , 13 years ago
Cc: | added |
---|
comment:14 by , 12 years ago
Needs tests: | unset |
---|---|
Owner: | removed |
Status: | reopened → new |
comment:15 by , 12 years ago
#19124 was a duplicate with an interesting use case. Let's assume the possible backwards incompatibilities and do the right thing™!
comment:16 by , 12 years ago
So I've given this some review - and it seems absolutely the right thing to do.
It seems fair to make a backwards incompatible change for anyone who is specifically abusing request.POST for non form data in this case.
Draft release note:
HttpRequest.POST will no longer include data posted via HTTP requests with non form-specific content-types in the header. In prior versions, data posted with content-types other than multipart/form-data or application/x-www-form-urlencoded would still end up represented in the request.POST attribute. Developers wishing to access the raw POST data for these cases, should use the request.body attribute instead.
We will also need a version changed note here
https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.POST
Clearly our current body attribute docs do a better job of explaining the situation, and can be borrowed from for an update to the POST attribute docs.
https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.body
comment:17 by , 12 years ago
Thanks Preston for the review. I uploaded a new patch with documentation.
comment:18 by , 12 years ago
I'm realizing that currently, we are defaulting no content-type to 'text/plain' for wsgi requests (see django/core/servers/basehttp.py, WSGIRequestHandler.get_environ). So there are two issues here:
- what content-type to default to when the request doesn't have one (see #17474 for example). The spec seems to indicate 'application/octet-stream' and we are doing 'text/plain'
- how do we treat non form data content types. This is the main issue of this ticket.
If we stick to the current patch (without changing the default content-type), I will have to remove the paragraph about default 'application/octet-stream'.
follow-up: 20 comment:19 by , 12 years ago
The default to 'text/plain' you only added recently - so there is no BC commitment there: https://github.com/django/django/commit/91727c76cda3f8bf286ee69652bc3625f150f6e7
I think we should be defaulting to ''
and retain essentially the absence of content-type - and continue to make sure the key exists but without burdening ourselves with a None check.
A default of ''
seems consistent from what I can tell from https://code.djangoproject.com/ticket/17474#comment:2
It will be up to anyone reading the request to follow the spec and assume octet-stream - I don't think we need to materialize that assumption, otherwise its impossible for end user code to distinguish between an implicit and explicit content-type=application/octet-stream.
As far as the reference to octet-stream in the release note - I don't think that is a big deal either way. Currently the patch behavior does *conform* to that expectation - but that doesn't mean much in this case, as we don't process the post data for octet stream. I'd say its more accurate to say that we only process the post data when the content-type indicates form or multipart - otherwise we just leave it in the body attribute - which is what the test on the patch verifies.
comment:20 by , 12 years ago
Replying to ptone:
The default to 'text/plain' you only added recently - so there is no BC commitment there: https://github.com/django/django/commit/91727c76cda3f8bf286ee69652bc3625f150f6e7
Wow, didn't even realized *I* committed this :-/
I think we should be defaulting to
''
and retain essentially the absence of content-type - and continue to make sure the key exists but without burdening ourselves with a None check.
Completely agree (fixed in new uploaded patch).
comment:21 by , 12 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
A couple doc suggestions - but then RFC IMO
https://code.djangoproject.com/attachment/ticket/5611/5611-6.diff#L62
"did also contain" replace with "contained"
Take the note starting with "Now..." and rephrase under the permanent section for attr POST ie:
...request contains form data. See the :class:QueryDict
documentation below. If you need to access raw or non-form data posted in the request, access this through the :attr:HttpRequest.body
attribute instead.
The point being that the version changed note is transient - but the point should be made concretely in the POST docs.
comment:22 by , 12 years ago
Now I understand why we added 'text-plain' in [91727c76cda3f8]. It appears to be the default behaviour of original Python WSGIRequestHandler (which calls self.headers.get_content_type(), which itself conforms to http://tools.ietf.org/html/rfc2045.html#section-5.2).
http://hg.python.org/cpython/file/2d2f206d040e/Lib/wsgiref/simple_server.py#l92
Should we diverge from Python? Should we report a bug? Or maybe this doesn't matter so much, because with the new behaviour, the content will still be left as is in the body attribute.
comment:23 by , 12 years ago
After [681550ca6d], my latest question is solved. We will use Python default behaviour, that is defaulting to 'text/plain'.
comment:24 by , 12 years ago
Owner: | set to |
---|---|
Resolution: | → fixed |
Status: | new → closed |
This ticket it no longer an issue. Accessing request.POST will still try to parse the content using QueryDict, but QueryDict doesn't throw an exception for non-form-encoded data. You might get a bogus POST, but to access the raw content, you should use request.raw_post_data