Code

Opened 7 years ago

Closed 21 months ago

#5611 closed Cleanup/optimization (fixed)

HTTPRequest should check content-type before parsing post body

Reported by: paulegan Owned by: Claude Paroz <claude@…>
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: jesse.lovelace@…, slacy@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The modpython handler currently tries to parse the request body as application/x-www-form-urlencoded regardless of the actual content type. This leads to an exception being thrown if the content is not parseable. See attached patch for suggested strict checking of request content-type.

Example

echo -e '\0' | curl -v --data-binary @- http://localhost/

Throws:

Traceback (most recent call last):

  File "/usr/lib64/python2.5/site-packages/mod_python/importer.py", line 1537, in HandlerDispatch
    default=default_handler, arg=req, silent=hlist.silent)

  File "/usr/lib64/python2.5/site-packages/mod_python/importer.py", line 1229, in _process_target
    result = _execute_target(config, req, object, arg)

  File "/usr/lib64/python2.5/site-packages/mod_python/importer.py", line 1128, in _execute_target
    result = object(arg)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 181, in handler
    return ModPythonHandler()(req)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 154, in __call__
    response = self.get_response(request)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 53, in get_response
    response = self._real_get_response(request)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 63, in _real_get_response
    response = middleware_method(request)

  File "/home/paul/work/site/trunk/bydesign/middleware/hotshot_profiler.py", line 13, in process_request
    if settings.DEBUG and request.has_key('prof'):

  File "/usr/lib/python2.5/site-packages/django/http/__init__.py", line 43, in has_key
    return key in self.GET or key in self.POST

  File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 73, in _get_post
    self._load_post_and_files()

  File "/usr/lib/python2.5/site-packages/django/core/handlers/modpython.py", line 56, in _load_post_and_files
    self._post, self._files = http.QueryDict(self.raw_post_data, encoding=self._encoding), datastructures.MultiValueDict()

  File "/usr/lib/python2.5/site-packages/django/http/__init__.py", line 148, in __init__
    for key, value in parse_qsl((query_string or ''), True): # keep_blank_values=True

TypeError: argument 1 must be string without null bytes, not str

Attachments (6)

parsesql.diff (1.4 KB) - added by Martin Kanerva 6 years ago.
Another workaround
modpython.diff (1.6 KB) - added by paulegan 5 years ago.
Patch to check content-type before parsing post body - updated to r9747
5611.diff (1.4 KB) - added by claudep 2 years ago.
Patch updated to current trunk
5611-4.diff (5.2 KB) - added by claudep 22 months ago.
Patch including test
5611-5.diff (7.5 KB) - added by claudep 21 months ago.
Patch including docs
5611-6.diff (8.4 KB) - added by claudep 21 months ago.
CONTENT_TYPE defaulting to the empty string

Download all attachments as: .zip

Change History (30)

comment:1 Changed 7 years ago by tdterry

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

This ticket it no longer an issue. Accessing request.POST will still try to parse the content using QueryDict, but QueryDict doesn't throw an exception for non-form-encoded data. You might get a bogus POST, but to access the raw content, you should use request.raw_post_data

comment:2 Changed 7 years ago by paulegan

Hm, is it fixed on a branch? On the trunk (r6844) the QueryDict constructor still throws an exception. A view which expects non url-encoded data will most likely use request.raw_post_data but any intervening code (e.g. middleware) that uses request.POST, request.REQUEST, request.has_key(), etc, will trigger this exception.

Just as it makes sense to check if the request content type is multipart for "file uploads", it seems sensible to check that the type is application/x-www-form-urlencoded before parsing the data as such. The only reason I can think of against such a check would be if there are user-agents out there which don't correctly set the content type for forms - but I don't believe that's the case.

comment:3 Changed 6 years ago by jacob

  • Resolution set to invalid
  • Status changed from new to closed

I can't reproduce on trunk [7168]:

[1] >>> from django.http import QueryDict

[2] >>> QueryDict("\0")
[2]   : <QueryDict: {u'\x00': [u'']}>

As you can see you get a nonsense querydict, but you don't get an exception.

comment:4 Changed 6 years ago by paulegan

  • Resolution invalid deleted
  • Status changed from closed to reopened

As per ticket title & patch, this is an issue when running django with mod_python. django.http first tries to load mod_python's C implementation of parse_qsl (source:django/trunk/django/http/__init__.py@7168#L13), this will fail outside the mod_python context and so the test above is not valid.

Even if an exception wasn't thrown, I can't think of any good reason for an objection to checking the content type before parsing. Not only is it the right thing to do, it is what other frameworks do. For example:

comment:5 Changed 6 years ago by jacob

  • Triage Stage changed from Unreviewed to Accepted

Ah, I'd forgotten that we use Apache's parse_qsl. I wonder how much faster that really is... seems to confuse things a bit.

comment:6 Changed 6 years ago by mentat

  • Cc jesse.lovelace@… added

I can verify this bug in r8921--wrapping parse_qsl in a try...except is a simple workaround we are using.

comment:7 Changed 6 years ago by paulegan

Wrapping the parser in a try block hides the error, but it is still terribly inefficient to attempt to parse potentially megabytes of data (perhaps POSTing image/jpeg) when you know beforehand from the content type that it is not application/x-www-form-urlencoded.

Changed 6 years ago by Martin Kanerva

Another workaround

comment:8 Changed 6 years ago by Martin Kanerva

Above workaround should apply cleanly to 1.0.

Changed 5 years ago by paulegan

Patch to check content-type before parsing post body - updated to r9747

comment:9 Changed 3 years ago by gabrielhurley

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:10 Changed 3 years ago by patchhammer

  • Easy pickings unset
  • Patch needs improvement set

modpython.diff fails to apply cleanly on to trunk

comment:11 Changed 2 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

Changed 2 years ago by claudep

Patch updated to current trunk

comment:12 Changed 2 years ago by claudep

  • Keywords modpython removed
  • Needs tests set
  • Patch needs improvement unset
  • Summary changed from modpython handler should check content-type before parsing post body to HTTPRequest should check content-type before parsing post body

No special opinion on the patch, just cleaned patch and description.

comment:13 Changed 2 years ago by slacy

  • Cc slacy@… added

Changed 22 months ago by claudep

Patch including test

comment:14 Changed 22 months ago by claudep

  • Needs tests unset
  • Owner nobody deleted
  • Status changed from reopened to new

comment:15 Changed 21 months ago by claudep

#19124 was a duplicate with an interesting use case. Let's assume the possible backwards incompatibilities and do the right thing™!

comment:16 Changed 21 months ago by ptone

So I've given this some review - and it seems absolutely the right thing to do.

It seems fair to make a backwards incompatible change for anyone who is specifically abusing request.POST for non form data in this case.

Draft release note:

HttpRequest.POST will no longer include data posted via HTTP requests with non form-specific content-types in the header. In prior versions, data posted with content-types other than multipart/form-data or application/x-www-form-urlencoded would still end up represented in the request.POST attribute. Developers wishing to access the raw POST data for these cases, should use the request.body attribute instead.

We will also need a version changed note here

https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.POST

Clearly our current body attribute docs do a better job of explaining the situation, and can be borrowed from for an update to the POST attribute docs.

https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.body

Changed 21 months ago by claudep

Patch including docs

comment:17 Changed 21 months ago by claudep

Thanks Preston for the review. I uploaded a new patch with documentation.

comment:18 Changed 21 months ago by claudep

I'm realizing that currently, we are defaulting no content-type to 'text/plain' for wsgi requests (see django/core/servers/basehttp.py, WSGIRequestHandler.get_environ). So there are two issues here:

  • what content-type to default to when the request doesn't have one (see #17474 for example). The spec seems to indicate 'application/octet-stream' and we are doing 'text/plain'
  • how do we treat non form data content types. This is the main issue of this ticket.

If we stick to the current patch (without changing the default content-type), I will have to remove the paragraph about default 'application/octet-stream'.

comment:19 follow-up: Changed 21 months ago by ptone

The default to 'text/plain' you only added recently - so there is no BC commitment there: https://github.com/django/django/commit/91727c76cda3f8bf286ee69652bc3625f150f6e7

I think we should be defaulting to '' and retain essentially the absence of content-type - and continue to make sure the key exists but without burdening ourselves with a None check.

A default of '' seems consistent from what I can tell from https://code.djangoproject.com/ticket/17474#comment:2

It will be up to anyone reading the request to follow the spec and assume octet-stream - I don't think we need to materialize that assumption, otherwise its impossible for end user code to distinguish between an implicit and explicit content-type=application/octet-stream.

As far as the reference to octet-stream in the release note - I don't think that is a big deal either way. Currently the patch behavior does *conform* to that expectation - but that doesn't mean much in this case, as we don't process the post data for octet stream. I'd say its more accurate to say that we only process the post data when the content-type indicates form or multipart - otherwise we just leave it in the body attribute - which is what the test on the patch verifies.

Changed 21 months ago by claudep

CONTENT_TYPE defaulting to the empty string

comment:20 in reply to: ↑ 19 Changed 21 months ago by claudep

Replying to ptone:

The default to 'text/plain' you only added recently - so there is no BC commitment there: https://github.com/django/django/commit/91727c76cda3f8bf286ee69652bc3625f150f6e7

Wow, didn't even realized *I* committed this :-/

I think we should be defaulting to '' and retain essentially the absence of content-type - and continue to make sure the key exists but without burdening ourselves with a None check.

Completely agree (fixed in new uploaded patch).

comment:21 Changed 21 months ago by ptone

  • Triage Stage changed from Accepted to Ready for checkin

A couple doc suggestions - but then RFC IMO

https://code.djangoproject.com/attachment/ticket/5611/5611-6.diff#L62
"did also contain" replace with "contained"

Take the note starting with "Now..." and rephrase under the permanent section for attr POST ie:

...request contains form data. See the :class:QueryDict documentation below. If you need to access raw or non-form data posted in the request, access this through the :attr:HttpRequest.body attribute instead.

The point being that the version changed note is transient - but the point should be made concretely in the POST docs.

Last edited 21 months ago by ptone (previous) (diff)

comment:22 Changed 21 months ago by claudep

Now I understand why we added 'text-plain' in [91727c76cda3f8]. It appears to be the default behaviour of original Python WSGIRequestHandler (which calls self.headers.get_content_type(), which itself conforms to http://tools.ietf.org/html/rfc2045.html#section-5.2).
http://hg.python.org/cpython/file/2d2f206d040e/Lib/wsgiref/simple_server.py#l92

Should we diverge from Python? Should we report a bug? Or maybe this doesn't matter so much, because with the new behaviour, the content will still be left as is in the body attribute.

comment:23 Changed 21 months ago by claudep

After [681550ca6d], my latest question is solved. We will use Python default behaviour, that is defaulting to 'text-plain'.

Version 0, edited 21 months ago by claudep (next)

comment:24 Changed 21 months ago by Claude Paroz <claude@…>

  • Owner set to Claude Paroz <claude@…>
  • Resolution set to fixed
  • Status changed from new to closed

In dfd4a7175119ddb422d8426dcc15902265d5a428:

Fixed #5611 -- Restricted accepted content types in parsing POST data

Thanks paulegan for the report and Preston Holmes for the review.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.