Opened 20 months ago

Last modified 6 hours ago

#21231 assigned New feature

Limiting the number of variables and files that a POST request can contain

Reported by: epandurski@… Owned by: timo
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: andre.cruz@…, apollo13 Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The Problem

When deploying a django application one should be able to asses what
will be *the absolute maximum* of memory consumed by each serving
process/thread. Failing to make this assessment correctly may leave the
application open to denial-of-service attacks. For example, when
deploying with apache, an excessive number of threads should be
available in order protect yourself from slowloris-type attacks. Also,
in order to limit the maximum memory and CPU usage under attack
"LimitRequestBody" should be set to a reasonably small value (1MB for
example). So, here are the naive math: "100 parallel requests" X "1MB
per request"
= "100MB maximum memory consumed"

The problem is that the naive math is wrong. A single 1MB POST request
may require *much* more that 1MB to process. 1MB of
"application/x-www-form-urlencoded" type may contain 200,000 short
name-value pairs, which when fetched into the interpreter consume tens of
megabytes (depending on the platform, probably 10-50MB). So the
correct math would be: "100 parallel requests" X "50MB per request" =
"5000MB maximum memory consumed". Probably too much for many systems.

As far as I know, django does not offer an easy way to protect against
such edge cases (lots of simultaneous malicious POST request that consume
unexpected amounts of memory).

The proposal

Introduce two new settings, limiting the maximum number of entities in
a POST request:

FORM_DATA_URLENCODED_MAX_VARS: the maximum number of name-value
pairs in the case of "application/x-www-form-urlencoded". (no limit by
default)

FORM_DATA_MULTIPART_MAX_PARTS: the maximum number of parts in the
case of "multipart/form-data". (no limit by default)

In case those limits are exceeded, the request must be discarded
without being fetched/parsed into memory. (In our example, it would
consume only 1MB instead of 50.)

Change History (48)

comment:1 Changed 20 months ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to wontfix
  • Status changed from new to closed

one should be able to asses what will be *the absolute maximum* of memory consumed by each serving process/thread.

I'm sorry, but I fail to see how this could be achieved in Python. Memory use will grow until the GC kicks in, and that isn't under your control.

A single 1MB POST request may require *much* more that 1MB to process

POST data is a rather small part of the memory footprint of a standard Django application. It isn't unusal to allocate 100MB or 200MB of RAM per process in a Django application server.

Your proposal doesn't appear to be predicated on a very good understanding of memory management in Python and memory use in Django applications; I'm not convinced it has enough value to warrant introducing two settings.

Besides, Django doesn't provide a production-grade application server, and doesn't plan to (see the docs for runserver). The common way to avoid excessive memory usage is to have the application server manage it, generally by killing processes that exceed a given threshold.

If you want to discuss possible ways to restrict memory use in Django, I suggest starting a discussion on the django-developers mailing-list. Thank you.

comment:2 Changed 20 months ago by epandurski@…

  • Resolution wontfix deleted
  • Status changed from closed to new

Obviously, I have not been sufficiently clear in explaining the problem.

I agree that as you say "It isn't unusual to allocate 100MB or 200MB of RAM per process in a Django application server", but it is not usual to allocate 50MB of RAM *for processing a single request*. For example, you may have 1 process (200MB) and 100 threads (say additional 100MB combined), and that's OK, but when you get a blast of requests, continuing malicious POST-data, each and every thread will have to create a huge django request-object (100 X 50 = 5000MB combined).

Memory is a complex problem, and it might very well be the case that I did not get it right. The last reply, though, doesn't appear to be predicated on a very thorough understanding of the proposal.

comment:3 Changed 20 months ago by aaugustin

I guess so.

comment:4 Changed 20 months ago by timo

I'm also not convinced this is something that Django needs to provide. Your best bet to get this accepted is to start a mailing list thread as suggested above.

comment:5 Changed 20 months ago by epandurski@…

Well, I do not see how a good a and configurable validation of the HTTP input is not something that Django needs to provide. The proposal also will help with the PYTHONHASHSEED-problem (maliciously crafted POST that results in O(n*n) dict-behaviour).

In any case I will not spend more energy to persuade anyone to the merits of my proposal (Honestly I do not care that much about Django). I will rather patch django when I deploy a critical web-app.

comment:6 Changed 20 months ago by mjtamlyn

If you have a patch which you believe works and solves this problem, then we will gladly consider it for inclusion upstream. That of course assumes that we can demonstrate that the changes work correctly and don't introduce any other significant issues. If these do provide a suitable security benefit against DDOS style attacks as then it may be worth turning this on by default.

comment:7 Changed 20 months ago by epandurski@…

I do not have a patch, and probably will not have soon, if ever. I discovered this problem while I was analysing the possible weaknesses of a web-app of mine that *may in the future* be installed as a business critical web-app (http://sourceforge.net/projects/cmb/).

So I decided to report my findings.

comment:8 Changed 20 months ago by mjtamlyn

  • Resolution set to needsinfo
  • Status changed from new to closed

Ok, I'm going to close this for now, largely as I agree with Aymeric that this will likely be exceedingly difficult (if possible) to implement in python, and I personally don't see it as a major weakness as it can be managed fairly easily by nginx or a similar web server.

If you do at some point write a patch which manages this directly in Django, please feel free to reopen this issue with a concrete patch and we'll see what happens.

comment:9 Changed 20 months ago by epandurski@…

  • Resolution needsinfo deleted
  • Status changed from closed to new

I think I have a nicely working patch for this issue. Here it is:

https://github.com/epandurski/django/commit/9ae456e4280843e87a24e508131dfd9919b20db7

What do you think?

comment:10 Changed 19 months ago by akaariai

I have a feeling this doesn't belong into Django. My hunch is that this will be solved only partially, and there are ways to circumvent the protections Django has. In addition, it is likely that Apache or nginx have already solved this problem for us.

Not closing this as I don't understand HTTP protocol that well...

comment:11 Changed 19 months ago by epandurski@…

It is a hot potato, isn't it?

I think this problem can be solved using an apache module, but I as far as I know, none of the standard modules does the job. (I assume mod_security could be helpful, but I do not know much about mod_security).

I think the problem is much easier to solve in Python/Django than as as a web-server module, exactly because it is not trivial and needs to allow flexible settings. I agree that my patch is quite incomplete, and I can think of other things that should be done as well. But those things are not very hard to do in python (as you can see, the my patch is trivial). Another important factor is that web developer as a whole do not understand those problems, and do not even know that they exists. Unfortunately, this does not make problems to disappear. Django could lead the way and promote good security practices.

Think about this simple example scenario: I have to allow users to upload big files (say 20MB) in my Django app, so I set my "LimitRequestBody" to 20MB. Then suddenly all forms in my Django application should be able to "swallow" 20MB of url-endoded data without eating up my CPU and memory! Also, how many uploaded files a 20MB POST request can contain? 200,000 should be a good guess. So, how much memory and CPU will consume the execution of 200,000 upload-handlers?

I think Django core developers should make a design decision about how to solve those problems, and what the web developers should be advised to do about them.

comment:12 follow-up: Changed 19 months ago by epandurski@…

I did some additional thinking about what settings should be available to the user so that he/she can have sufficient control over the validation of the POST-ed raw data. I believe that exactly 3 parameters are needed:

FORM_DATA_MAX_FILES_COUNT -- the maximum number of file-parts in the case of "multipart/form-data". (no limit by default)

FORM_DATA_MAX_FIELDS_COUNT -- the maximum number of name-value pairs in the case of "application/x-www-form-urlencoded", and the maximum number of field-parts in case of "multipart/form-data". (no limit by default)

FORM_DATA_MAX_FIELDS_TOTAL_LENGTH -- the maximum total length of the posted fields (both for "application/x-www-form-urlencoded" and "multipart/form-data"). This is the sum of the lengths of all field-names and field-values). Reading from the input socket should not continue after this length-limit has been exceeded. (no limit by default)

Implementing those should be relatively easy (not much more complex than my proposed patch).

More suitable names for these settings may be decided.

One more think, that I forgot to mention: I believe Django is the proper place to validate this things, because they are part of the HTTP request body (the payload). HTTP does not restrict the format of the request payload (as far as I know), and generally it depends of the MIME-type stated in the HTTP-header (application/x-www-form-urlencoded, multipart/form-data, etc.) You should not expect Apache to make such a deep-inspection of request's payload.

comment:13 Changed 18 months ago by zerok

I agree that these settings would be nice, but I agree with Aymeric et al. that Django is perhaps not the best place for it. Since we are talking about a memory sizing issue it would be preferable to stop processing such requests as early as possible. Since Django is usually situated at or close to the end of the request processing chain, something before it handling this issue would be preferable. I totally understand that Apache usually is not expected to the deep-request-inspection but at the least the application server should be able to do that. Going down to the application framework or the application itself for something like that should be only a last resort.

As reference from other environments, Tomcat has a maxParameterCount (1). Gunicorn offers something similar but sadly only for headers (2). Regarding the number of file-parts, this is something mod_security already offers (3).

(1) http://tomcat.apache.org/tomcat-7.0-doc/config/http.html
(2) http://docs.gunicorn.org/en/latest/configure.html#limit-request-fields
(3) https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#wiki-SecUploadFileLimit

comment:14 Changed 18 months ago by epandurski@…

"I totally understand that Apache usually is not expected to the deep-request-inspection but at the least the application server should be able to do that."

The application server may serve lots of web applications, each one needing a different validation policy. Only the web application knows what it does with the input data, and therefore only the application (or the application framework) is in a position to correctly validate the input.

I agree that it is good idea to do input validation as early as possible, and be as isolated as possible from the rest. I do not see, however, where else this functionality can be inserted in the Django framework. I do not think it can be implemented as Django middleware. Suggestions?

comment:15 Changed 18 months ago by tomchristie

I agree that this is something we should be considering. Take the following...

import time
from django.http import QueryDict
content = ('a=1&' * 2500000).rstrip('&')
d = QueryDict(content, encoding='utf-8');

That represents parsing a single valid 10MB URLEncoded POST request. On my machine it consumes ~350MB and 100% CPU for ~30secs.
There's no built-in configurable protection in either Apache or Nginx against that type of request without having also installed optional plugins, and I'd consider max_client_body_size = 10m to be a perfectly reasonable nginx setting.

I'll defer to the core dev team on this, but request parsing really does seem to me to be the domain of the application to deal with, rather than the server. For example, suppose you wanted the application to deal with JSON requests that can contain base64 encoded file fields, or some other content type that can contain a mix of file uploads and standard data - the server has no way of parsing and inspecting if the requests are malicious or not. As a further example - would we expect the server to be responsible for parsing and validating against malicious XML payloads and other content types that the application might parse?

FWIW, I believe that PHP includes built-in protection against this type of request, I've not managed to find information about other languages/frameworks.

The simplest option would prob. be similar to the FORM_DATA_MAX_FIELDS_TOTAL_LENGTH already noted, but perhaps tweak the semantics slightly so that it's simply a setting specifying the maximum allowable request size *excluding* any file upload parts REQUEST_MAX_DATA_SIZE or similar. That'd allow the developer to differentiate between the maximum allowable file upload size per-request, and the maximum allowable in-memory data structures per-request.

comment:16 Changed 18 months ago by aaugustin

  • Triage Stage changed from Unreviewed to Accepted

comment:17 Changed 18 months ago by aaugustin

OK — I'm convinced.

comment:18 Changed 16 months ago by edevil

  • Cc andre.cruz@… added

comment:19 in reply to: ↑ 12 Changed 16 months ago by apollo13

  • Cc apollo13 added

Replying to epandurski@…:

FORM_DATA_MAX_FIELDS_TOTAL_LENGTH -- the maximum total length of the posted fields (both for "application/x-www-form-urlencoded" and "multipart/form-data"). This is the sum of the lengths of all field-names and field-values). Reading from the input socket should not continue after this length-limit has been exceeded. (no limit by default)

Isn't that covered by Apache's LimitRequestBody already?

comment:20 follow-up: Changed 16 months ago by epandurski@…

Isn't that covered by Apache's LimitRequestBody already?

I almost have forgotten what were my intentions with the "FORM_DATA_MAX_FIELDS_TOTAL_LENGTH" field :)

I believe they are almost the same, except for 2 things:

1) LimitRequestBody's context is "server config". "FORM_DATA_MAX_FIELDS_TOTAL_LENGTH" is specific to the web-application. There might be many different applications running on one server, doing different CPU/memory intensive stuff.

2) FORM_DATA_MAX_FIELDS_TOTAL_LENGTH is only about "application/x-www-form-urlencoded" and "multipart/form-data" MIME types, which I believe are the only MIME types which content django may decide to load into memory (is that correct?). Notice that somebody may want to allow POST-ing other MIME-types of big data, and yet have his/her django web-application protected from DoS attacks.

comment:21 in reply to: ↑ 20 Changed 16 months ago by apollo13

Replying to epandurski@…:

1) LimitRequestBody's context is "server config". "FORM_DATA_MAX_FIELDS_TOTAL_LENGTH" is specific to the web-application. There might be many different applications running on one server, doing different CPU/memory intensive stuff.

LimitRequestBody can be set per file/location/vhost etc, so that wouldn't count

2) FORM_DATA_MAX_FIELDS_TOTAL_LENGTH is only about "application/x-www-form-urlencoded" and "multipart/form-data" MIME types, which I believe are the only MIME types which content django may decide to load into memory (is that correct?). Notice that somebody may want to allow POST-ing other MIME-types of big data, and yet have his/her django web-application protected from DoS attacks.

I might be wrong here: But if my app allows file uploads and I set this value to be equivalent to 1GB (I have big files ;)) I could still easily DOS the server; in my opinion it would make more sense if that would limit post data excluding file data.

comment:22 Changed 16 months ago by epandurski@…

LimitRequestBody can be set per file/location/vhost etc, so that wouldn't count

http://httpd.apache.org/docs/2.2/mod/core.html

I might be wrong here: But if my app allows file uploads and I set this value
to be equivalent to 1GB (I have big files ;)) I could still easily DOS the server;
in my opinion it would make more sense if that would limit post data excluding
file data.

Ah, now I remember what was the idea. You are right. The idea is that
FORM_DATA_MAX_FIELDS_TOTAL_LENGTH includes only the length of the form-fields
in the "multipart/form-data" POST. (A "multipart/form-data" POST may contain file
uploads and form fields).

comment:23 Changed 16 months ago by edevil

I just like to add that the fact that Django does not deal well with large form fields or multipart parts is an unexpected behaviour. Looking at the file upload features one would think that Django could cope well in use cases where we want to receive large file uploads, and so have our web server configured to allow large request bodies. In the case of "application/x-www-form-urlencoded" requests, large bodies should not be expected and normally we can filter them at the web server. But "multipart/form-data" requests are processed by the application, since we want to allow large FILE parts but not large FIELD parts, and so it is there that safety measures need to be.

I know of several other sites that use Django, process large file uploads, which are susceptible to this issue. I've contacted the developers and they were indeed just as surprised as I was by this "feature" which I consider a security problem.

Last edited 16 months ago by edevil (previous) (diff)

comment:24 Changed 16 months ago by apollo13

@epandurski: Can you change your patch to include the three settings you proposed and open a PR?

@edevil: I am having a hard time to figure out what you are trying to achieve with your comment. Since this was publicly reported we can't hide it, no matter if we consider it a security problem or not. The ticket is accepted, so the next step would be to provide a working patch and put it up for discussion.

comment:25 Changed 16 months ago by edevil

@apollo13: I'm trying to get this to be considered a security problem so that it is backported to Django 1.5.

comment:26 Changed 16 months ago by apollo13

@evdevil: Even if we consider it security sensitive, if 1.7 comes out before inclusion of this patch, it won't get fixed in 1.5. Since 1.5 is EOL soon I'd strongly encourage you to upgrade or stick with LTS releases.

comment:27 Changed 16 months ago by epandurski@…

@apollo13: I may try to do this, but I am quite busy lately. Most probably, I will not have enough spare time to work on this soon. I expect the work to take 0.5-2 days, but I remember that the HTTP-parsing code is quite hairy, so it might get longer to do it right.

comment:28 Changed 15 months ago by tomchristie

First pass pull request: https://github.com/django/django/pull/2403

Having a single setting limiting the maximum memory for data in a request would address this sufficiently from my POV.

comment:29 Changed 15 months ago by edevil

I have a few questions regarding the pull request, which I left in the Github page. Should I post them here?

Last edited 15 months ago by edevil (previous) (diff)

comment:30 Changed 15 months ago by tomchristie

@edevil - No, may as well keep those comments just against the PR. (They're not really *design* discussion stuff, just implementation details, so...)

comment:31 Changed 13 months ago by anonymous

Any news regarding this?

comment:32 Changed 12 months ago by anonymous

This is a serious problem, and there seems to be a patch available. Can someone from the dev team look at it please?

comment:33 Changed 12 months ago by erikr

  • Needs tests set
  • Patch needs improvement set

I haven't looked very closely, but there are still some unaddressed comments on the PR, and tests are missing.

comment:34 Changed 11 months ago by timo

  • Owner changed from nobody to timo
  • Status changed from new to assigned

comment:35 Changed 10 months ago by timo

  • Has patch set
  • Needs tests unset
  • Patch needs improvement unset

Updated PR with tests.

comment:36 Changed 10 months ago by apollo13

  • Patch needs improvement set

See comments on the PR.

comment:37 Changed 5 months ago by edevil

  • Has patch unset
  • Patch needs improvement unset

I submitted a new PR that addresses some of the comments.

https://github.com/django/django/pull/3852

comment:38 Changed 5 months ago by edevil

  • Has patch set

comment:39 Changed 4 months ago by edevil

I've answered all the comments on the PR. Is there any additional changes you would like me to make?

Can this be merged?

comment:40 Changed 4 months ago by timgraham

We just had the feature freeze for 1.8 and are working on stabilizing master so we can issue an alpha release, so this likely won't be merged until 1.9 at the earliest.

comment:41 Changed 4 months ago by edevil

I thought the "feature freeze" did not apply to security fixes?

comment:42 Changed 4 months ago by timgraham

I think this falls under "security hardening features." If you believe this is a security fix that should be backported to all support versions of Django, please use security@… rather than this ticket tracker, as described in our security policies. Thanks.

comment:43 Changed 4 months ago by edevil

I do believe this is a security problem and I have used security@ to bring attention to this issue. In February and March of 2014 I exchanged emails with Russell Keith-Magee and Florian Apolloner who both asked me not to discuss this publicly. Russell even said that the core team was considering this as a security issue and that they were discussing how to solve it (I'm not on the security@ list).

@tomchristie above already hints that there is a problem with the current implementation so I'm not revealing anything new.

comment:44 Changed 4 months ago by timgraham

I'll see if I have some time next week to review this issue.

comment:45 Changed 2 months ago by timgraham

  • Patch needs improvement set

Patch needs a rebase.

comment:46 Changed 2 weeks ago by edevil

  • Patch needs improvement unset

Rebased.

comment:47 Changed 9 days ago by edevil

Anything missing?

comment:48 Changed 6 hours ago by edevil

Any comments?

Something I should improve?

Note: See TracTickets for help on using tickets.
Back to Top