Opened 8 hours ago

#36700 new Bug

ASGIHandler creates reference cycles that require a gc pass to free

Reported by: Patryk Zawadzki Owned by:
Component: HTTP handling Version: 5.2
Severity: Normal Keywords: memory asgihandler gc
Cc: Patryk Zawadzki Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Disclaimer: it's impossible for pure Python code to truly leak memory (in the sense that valgrind would detect), however it's quite easy to create structures that effectively occupy memory for a long time because they require the deepest (generation 2) garbage collection cycle to collect and that happens very rarely. In addition to that, the more such structures aggregate, the more expensive the garbage collection cycle becomes, because it effectively stops the entire interpreter to do its job and it can take seconds. On top of that, it's entirely possible for a container to run out of memory before the garbage collection happens and we (Saleor Commerce) see containers being terminated by the kernel OOM killer due to high memory pressure where most of that memory is locked by garbage.

One such case is found in the ASGIHandler. When handling a request, the ASGIHandler.handle spawns two async tasks. One for the actual app code (process_request) and one for the disconnection handler (ASGIHandler.listen_for_disconnect). The latter will raise RequestAborted every time it receives the http.disconnect ASGI message.

In our setup (uvicorn), the http.disconnect message is received for every request, even after successfully processing the view code and delivering the response, but that's not critical for this issue, it just makes it easy to reproduce this on our end.

Here's where the problem is:

  1. When RequestAborted is raised, its stack trace includes the call to ASGIHandler.handle, which is where ASGIHandler.listen_for_disconnect was called.
  2. In turn, the ASGIHandler.handle stack frame includes references to all local variables.
  3. Among those variables is tasks which holds the references to both async tasks.
  4. Now, one of those tasks is the task created from ASGIHandler.listen_for_disconnect.
  5. The task future is already resolved and now holds a reference back to the RequestAborted exception from step 1. And thus the cycle completes, creating an unfreeable reference cycle.

All of those objects hold references to other objects and stack frames that also become unfreeable, ending up holding a sizeable list of objects hostage until the next time gc.collect(2) happens (which can be minutes, depending on how much code your app executes).

Making ASGIHandler.handle explicitly call tasks.clear() or just del tasks after the tasks are no longer needed breaks the cycle by removing the link between the exception stack frame locals and the future referencing the exception.

PS: I've classified this as a bug as high memory use can lead to OOM kills and crashes but feel free to reclassify as "cleanup/optimization" if that's more fitting.

Attachments (1)

asgi-ref-cycle.svg (98.5 KB ) - added by Patryk Zawadzki 8 hours ago.

Download all attachments as: .zip

Change History (1)

by Patryk Zawadzki, 8 hours ago

Attachment: asgi-ref-cycle.svg added
Note: See TracTickets for help on using tickets.
Back to Top