Opened 6 months ago

Closed 6 months ago

Last modified 6 months ago

#35415 closed Bug (invalid)

Adding content_type to StreamingHttpResponse on Linux causes memory error after streaming around 1GB-2GB of data.

Reported by: LouisB12345 Owned by: nobody
Component: HTTP handling Version: 5.0
Severity: Normal Keywords:
Cc: LouisB12345 Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This bug took a few days to work out and was extremely annoying.
I'm running Django under ASGI and im using was trying to use to stream a on-the-fly zip-file using the StreamingHttpResponse, note: i dont know if this occurs under WSGI.
I'm developing on a Windows operating system and after I deemed the code to be functional i tried it on the Linux vm i have set up.
I noticed that the download would fail almost everytime. The cause was that the memory usage kept increasing after some time, usually after around 1-2GB was streamed. So after eliminating multiple factors I came to the conclusion that when i add content_type= withing the StreamingHttpResponse this bug occurs.

You can replicate the bug on Linux with the code below, if you remove the content_type it works as expected but with it the bug occurs.

from os.path import basename
import logging
import aiofiles
from django.contrib.auth.mixins import LoginRequiredMixin
from django.http import StreamingHttpResponse
from django.views import View
from guppy import hpy

H = hpy()

LOGGER = logging.getLogger(__name__)


class DownloadSelectedFiles(LoginRequiredMixin, View):
    def get(self, request) -> StreamingHttpResponse:
        file_name = "f.txt"
        response = StreamingHttpResponse(file_data(file_name), content_type="application/octet-stream")
        response["Content-Disposition"] = f'attachment; filename="{basename(file_name)}"'
        return response


async def file_data(file_path):
    async with aiofiles.open(file_path, "rb") as f:
        LOGGER.info(f"Current threads are {threading.active_count()} opening file {file_path}\n{H.heap()}")
        teller = 0
        while chunk := await f.read(65536):
            teller += 1
            await asyncio.sleep(0)
            if teller % 1000 == 0:
                LOGGER.info(f"Current threads are {threading.active_count()} yielding chunk nr.{teller}\n{H.heap()}")
            yield chunk

I have some images of the output of the Logs to show the difference.

Change History (4)

comment:1 by LouisB12345, 6 months ago

I have some images that i want to attach, but for some reason i can upload them? Because it is 80+% chance to be spam according to SpamBayes.
I forgot to mention that you can mitigate the memory usage by using asyncio.sleep(0.01) this, however results in a extremely slowed download.

Last edited 6 months ago by LouisB12345 (previous) (diff)

comment:2 by Sarah Boyce, 6 months ago

Resolution: needsinfo
Status: newclosed

Hi LouisB12345, this looks a little unusual to me as you have a sync view calling an async function.
Maybe because of the context switching between sync and async it's waiting for the data to accumulate before sending? What server are you running here?

I recommend you post on the forum, verify that StreamingHttpResponse is being used as expected, and Django is at fault here.

in reply to:  2 comment:3 by LouisB12345, 6 months ago

Replying to Sarah Boyce:

Hello Sarah,

I know for sure that the data is not accumulating before sending, because the download starts immediately. If i where to not call an async function, then you will notice the delay and see that it loads the entire file in memory. Also this would not explain why the memory-error does not happen when i leave out the content_type.

The server I am running is a Proxmox vm running Debian12 with 4 cores and 4GB ram, intel-core i5-6500T.

comment:4 by Natalia Bidart, 6 months ago

Resolution: needsinfoinvalid

Hello LouisB12345! Thank you for your report. As Sarah mentioned, the best course of action at this point is to reach out to the community in the Django Forum (async category) to get help debugging your view, since we are not able to reproduce. See below for the full details of the reproducer that I setup locally, streaming a 3.3G iso image, without getting any memory usage increase nor memory error.

Since the goal of this issue tracker is to track issues about Django itself, and your issue seems, at first, to be located in your custom code, I'll be closing this ticket as invalid following the ticket triaging process. If, after debugging, you find out that this is indeed a bug in Django, please re-open with the specific details and please be sure to include a small but complete Django project to reproduce or a failing test case.

The reproducer I used looks as follows:

  • A local Django project (projectfromrepo) with an app for this ticket (ticket_35415)
  • uvicorn installed and serving Django with python -Wall -m uvicorn projectfromrepo.asgi:application --reload
  • A views.py with this (slightly simplified) content:
    import aiofiles
    import logging
    import os
    import threading
    
    from django.http import StreamingHttpResponse
    
    
    logger = logging.getLogger(__name__)
    
    
    def debug(msg):
        logger.info(msg)
        print(msg)
    
    
    def file_download(request):
        file_name = "/home/nessita/debian-live-12.2.0-amd64-kde.iso"
        assert os.path.exists(file_name)
        debug(f"Requested {file_name} which stats {os.stat(file_name)=}.")
        response = StreamingHttpResponse(
            file_data(file_name), content_type="application/octet-stream"
        )
        response["Content-Disposition"] = f'attachment; filename="{file_name}"'
        return response
    
    
    async def file_data(file_path, chunk_size=65536):
        debug(f"Current threads are {threading.active_count()} opening file {file_path}.")
        async with aiofiles.open(file_path, mode="rb") as f:
            teller = 0
            while chunk := await f.read(chunk_size):
                teller += 1
                if teller % 1000 == 0:
                    debug(
                        f"Current threads are {threading.active_count()} yielding chunk nr.{teller}."
                    )
                yield chunk
    
  • Included the following in the main urls.py path("streaming/", ticket_35415.views.file_download)
  • Visiting http://localhost:8000/streaming/ works without any issues and the downloaded file matches the hash of the source file. What's printed in the terminal:
    (djangodev) [nessita@socrates projectfromrepo]$ python -Wall -m uvicorn projectfromrepo.asgi:application --reload
    INFO:     Will watch for changes in these directories: ['/home/nessita/fellowship/projectfromrepo']
    INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
    INFO:     Started reloader process [435237] using StatReload
    Requested /home/nessita/debian-live-12.2.0-amd64-kde.iso which stats os.stat(file_name)=os.stat_result(st_mode=33188, st_ino=8093957, st_dev=66306, st_nlink=1, st_uid=1001, st_gid=1001, st_size=3492741120, st_atime=1698110826, st_mtime=1698112190, st_ctime=1698112191).
    Current threads are 2 opening file /home/nessita/debian-live-12.2.0-amd64-kde.iso.
    Current threads are 4 yielding chunk nr.1000.
    Current threads are 4 yielding chunk nr.2000.
    Current threads are 4 yielding chunk nr.3000.
    Current threads are 4 yielding chunk nr.4000.
    Current threads are 4 yielding chunk nr.5000.
    Current threads are 4 yielding chunk nr.6000.
    Current threads are 4 yielding chunk nr.7000.
    Current threads are 4 yielding chunk nr.8000.
    Current threads are 4 yielding chunk nr.9000.
    Current threads are 4 yielding chunk nr.10000.
    Current threads are 4 yielding chunk nr.11000.
    Current threads are 4 yielding chunk nr.12000.
    Current threads are 4 yielding chunk nr.13000.
    Current threads are 4 yielding chunk nr.14000.
    Current threads are 4 yielding chunk nr.15000.
    Current threads are 4 yielding chunk nr.16000.
    Current threads are 4 yielding chunk nr.17000.
    Current threads are 4 yielding chunk nr.18000.
    Current threads are 4 yielding chunk nr.19000.
    Current threads are 4 yielding chunk nr.20000.
    Current threads are 4 yielding chunk nr.21000.
    Current threads are 4 yielding chunk nr.22000.
    Current threads are 4 yielding chunk nr.23000.
    Current threads are 4 yielding chunk nr.24000.
    Current threads are 4 yielding chunk nr.25000.
    Current threads are 4 yielding chunk nr.26000.
    Current threads are 4 yielding chunk nr.27000.
    Current threads are 4 yielding chunk nr.28000.
    Current threads are 4 yielding chunk nr.29000.
    Current threads are 4 yielding chunk nr.30000.
    Current threads are 4 yielding chunk nr.31000.
    Current threads are 4 yielding chunk nr.32000.
    Current threads are 4 yielding chunk nr.33000.
    Current threads are 4 yielding chunk nr.34000.
    Current threads are 4 yielding chunk nr.35000.
    Current threads are 4 yielding chunk nr.36000.
    Current threads are 4 yielding chunk nr.37000.
    Current threads are 4 yielding chunk nr.38000.
    Current threads are 4 yielding chunk nr.39000.
    Current threads are 4 yielding chunk nr.40000.
    Current threads are 4 yielding chunk nr.41000.
    Current threads are 4 yielding chunk nr.42000.
    Current threads are 4 yielding chunk nr.43000.
    Current threads are 4 yielding chunk nr.44000.
    Current threads are 4 yielding chunk nr.45000.
    Current threads are 4 yielding chunk nr.46000.
    Current threads are 4 yielding chunk nr.47000.
    Current threads are 4 yielding chunk nr.48000.
    Current threads are 4 yielding chunk nr.49000.
    Current threads are 4 yielding chunk nr.50000.
    Current threads are 4 yielding chunk nr.51000.
    Current threads are 4 yielding chunk nr.52000.
    
Note: See TracTickets for help on using tickets.
Back to Top