Opened 11 months ago

Last modified 4 weeks ago

#35289 assigned Bug

Chunked transfer encoding is not handled correctly by MultiPartParser

Reported by: Kyle Agronick Owned by: YashRaj1506
Component: HTTP handling Version: dev
Severity: Normal Keywords: transfer-encoding, chunked, multi-part
Cc: bcail, Carlton Gibson Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I am trying to post large files from C#. The C# implementation conforms to the HTTP1.1 protocol and does not allow chunked transfer encoding and a content-length to be set at the same time.

Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored." (RFC 2616, Section 4.4)

In the MultiPartParser a non-existent content-length header is treated as 0:
https://github.com/django/django/blob/main/django/http/multipartparser.py#L96

When the 0 length is read, nothing gets parsed:
https://github.com/django/django/blob/main/django/http/multipartparser.py#L147

If you remove the check in the second link, everything works correctly. It works fine without an explicit content-length.

With Nginx to turn off request buffering for a single request you need to use chunked transfer encoding.

Change History (13)

comment:1 by Kyle Agronick, 11 months ago

I can confirm that just by making a middleware that does

        if request.headers["Transfer-Encoding"] == "chunked" and "CONTENT_LENGTH" not in request.META:
            request.META["CONTENT_LENGTH"] = "1"

everything gets parsed correctly even though the CONTENT_LENGTH is much more than 1.

Maybe this issue went undetected for so long because WSGI does not have good support for chunked transfer encoding. But ASGI does.

comment:2 by Kyle Agronick, 11 months ago

It looks like the only place content_length is used is in MemoryFileUploadHandler where self.activated = content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE. Content-length can be spoofed causing the MemoryFileUploadHandler to read everything into memory. It seems like the complexity of TemporaryFileUploadHandler and MemoryFileUploadHandler could be reduced by having one file handler that uses SpooledTemporaryFile(max_size=settings.FILE_UPLOAD_MAX_MEMORY_SIZE).

comment:3 by bcail, 10 months ago

Cc: bcail added

This Daphne issue may be relevant. There's a quote in that issue (from the ASGI spec) that the ASGI server is supposed to dechunk the request - so shouldn't it come to Django as a regular (not chunked) request? Hopefully someone else will weigh in here as well.

What ASGI server are you using?

in reply to:  3 comment:4 by Kyle Agronick, 10 months ago

Replying to bcail:

This Daphne issue may be relevant. There's a quote in that issue (from the ASGI spec) that the ASGI server is supposed to dechunk the request - so shouldn't it come to Django as a regular (not chunked) request? Hopefully someone else will weigh in here as well.

What ASGI server are you using?

You can see in the ASGI spec mentioned in that ticket it says "a response that is given to the server with no Content-Length may be chunked as the server sees fit". Which is what is happening. Everything is streaming into Django correctly. It just doesn't know how to handle a non-existent content length.

That seems to be the same issue and I would argue this is a Django issue as ASGI servers are streaming the results correctly. If the ASGI servers were expected to buffer the files you would introduce blocking IO with tempfiles or OOM issues if it all happened in memory. I came across this issue while trying to implement large file uploads >6GB. The less buffering the better.

I am using Daphene in development and Uvicorn in production. My hacky middleware fix with the content-length seems to work on both of them.

comment:5 by Natalia Bidart, 10 months ago

Cc: Carlton Gibson added

Hello Kyle!

RFC 2616 is obsoleted by RFC 7230 which in turn is obsoleted by RFC 9110 and RFC 9112, and as far as I read in RFC 9112, both headers could co-exist (but Content-Length should be ignored if Transfer-Encoding is sent):

Early implementations of Transfer-Encoding would occasionally send both a chunked transfer coding for message framing and an estimated Content-Length header field for use by progress bars. This is why Transfer-Encoding is defined as overriding Content-Length, as opposed to them being mutually incompatible.

But, at the same time, I found an analogous issue for Daphne, and I see that you have just commented on the other referenced Daphne issue, so this may be a valid issue. I will cc/Carlton to confirm before fully triaging.

comment:6 by bcail, 10 months ago

You can see in the ASGI spec mentioned in that ticket it says "a response that is given to the server with no Content-Length may be chunked as the server sees fit". Which is what is happening. Everything is streaming into Django correctly. It just doesn't know how to handle a non-existent content length.

I actually understand that to be talking about responses coming *from* Django (or another ASGI app) back to the ASGI server, not a request going into Django.

comment:7 by Carlton Gibson, 10 months ago

This comment on the related request-centred Daphne issue suggests that that request body is correctly dechunked before being passed off to Django. The question we had there was "why it's not getting out of Django's HttpRequest._load_post_and_files()" Kyle's suggestion looks on point: the missing content length isn't an error (IIUC).

I'd guess here we'd would set expected content length to len(request.body) if the Transfer-Encoding header field was set 🤔

comment:8 by Natalia Bidart, 10 months ago

Triage Stage: UnreviewedAccepted
Version: 5.0dev

Following Carlton's comment and the various links/issues, I think this is ticket should be accepted. Kyle, would you like to prepare a patch?

comment:9 by bcail, 10 months ago

request body is correctly dechunked before being passed off to Django

If a request body is dechunked, should the Transfer-Encoding header still be set to "chunked" and the Content-Length header be missing? Would it make sense to change those if the request body is dechunked?

comment:10 by Kyle Agronick, 10 months ago

I can make a patch. There are a few ways to fix this. I think the best way should be a new upload file handler that is the default which uses SpooledTemporaryFile so it can cleanly fall back if there is too much data while being fast for small requests. I think we also need to check DATA_UPLOAD_MAX_MEMORY_SIZE as we are reading the stream into the temp file and throw a SuspiciousOperation if we read too much. Looking at the docs it seems FILE_UPLOAD_MAX_MEMORY_SIZE and DATA_UPLOAD_MAX_MEMORY_SIZE are named similarly but one controls if we write to disk and one controls if we abort the request.

That is a bit ambitious considering I've never contributed to Django and would probably require deprecating the current handlers. If we don't want to do the ambitious approach, the check in MemoryFileUploadHandler that does self.activated = content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE could be self.activated = content_length and content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE and cause all requests that don't have a content-length to fall back to TemporaryFileUploadHandler in the default configuration. That would mean MemoryFileUploadHandler won't work with chunked transfer-encoding without a content-length. Perhaps add a note in the docs since reading an unspecified content-length into memory is dangerous anyways.

Thoughts?

comment:11 by bcail, 10 months ago

The ASGI handler might be relevant.

in reply to:  11 comment:12 by Kyle Agronick, 10 months ago

Replying to bcail:

The ASGI handler might be relevant.

That is interesting. So it seems like it could be optimized even further. In my use case with the 6GB upload it would create two 6GB files on disk. _load_post_and_files seems like it should be hooked into that directly with the MultiPartParser instead of buffering it all to disk and then reading it out but that seems even more ambitious. Another problem I see there is that once it spools to disk it is going to block the event loop because it is doing blocking IO. It should be doing something like asgiref's sync_to_async.

comment:13 by YashRaj1506, 4 weeks ago

Owner: changed from nobody to YashRaj1506
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top