Opened 17 months ago
Last modified 29 hours ago
#34865 assigned Bug
DatabaseWrapper are not GC and connections are not closed
Reported by: | Fábio Domingues | Owned by: | Priyank Panchal |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | |
Cc: | Carlton Gibson, Andrew Godwin, David Wobrock, Florian Apolloner | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I have found possible memory leak when investigating why database connections where not closed after the thread/coroutine ended (https://code.djangoproject.com/ticket/33497 and https://github.com/django/django/pull/17275), the DatabaseWrapper
ends up in a circular reference with DatabaseFeatures
/DatabaseCreation
/DatabaseClient
/... and the GC could not cleanup the objects.
I tryed use WeakRef
from the "child" objects to the wrapper or deleting the __dict__
and the wrapper was collected and the connections was closed.
Attachments (2)
Change History (22)
by , 17 months ago
Attachment: | databasewrapper.png added |
---|
follow-up: 2 comment:1 by , 17 months ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:2 by , 17 months ago
Fixing this memory leak will not solve the problem of the persistent connections not been reuse, only the fact that they will not pill up (and starve the DB) if inaccessible.
But I'm new here, still learning. Should I move the findings to #33497?
comment:3 by , 17 months ago
Cc: | added |
---|---|
Resolution: | duplicate |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
We can keep this separately.
Tentatively accepted, I'd like to check out the patch proposal.
comment:4 by , 17 months ago
...I'd like to check out the patch proposal.
Yes, please.
Any chance you could give a minimal example showing the behaviour here, to look at?
Various folks have reported leaky behaviour on Channels/Daphne, but we've not been able to pin it down.
… ends up in a circular reference ... and the GC could not cleanup the objects.
Just out of interest, could you try throwing in a gc.collect()
— that should get it — unless we never let the DatabaseWrapper
go, which is why it would be nice to look at the example — but WeakRef
is hopefully right in this case.
comment:5 by , 17 months ago
Tested using Uvicorn and Daphne, the database engine should not matter, I tested with PostgreSQL, CONN_MAX_AGE
could be any value and the effect is the same using async or sync views.
get_refs = lambda: [o for o in gc.get_objects() if isinstance(o, BaseDatabaseWrapper)] plot_refs = lambda: objgraph.show_backrefs(get_refs(), filename="sample-graph.png") def sync_view(request): MyModel.objects.count() # gc.collect() # plot_refs() return HttpResponse(f"Number of BaseDatabaseWrapper instances: {len(get_refs())}") async def async_view(request): await MyModel.objects.acount() # gc.collect() # plot_refs() return HttpResponse(f"Number of BaseDatabaseWrapper instances: {len(get_refs())}")
On each refresh you should see the number of instances increase. If you plot the references you would see many diagrams like the one in the attachment. Throwing gc.collect()
actually get it, but it take some milliseconds, even if the underlying database connection is already closed.
comment:6 by , 17 months ago
One more detail. Using objgraph.show_growth
without doing gc.collect
give us this on each new request:
list 1990 +10 dict 6798 +3 deque 20 +2 lock 43 +2 OrderedDict 11 +2 partial 24 +2 ReferenceType 3848 +1 DatabaseWrapper 6 +1 DatabaseClient 6 +1 DatabaseCreation 6 +1 DatabaseFeatures 6 +1 DatabaseIntrospection 6 +1 DatabaseOperations 6 +1 BaseDatabaseValidation 6 +1 DatabaseErrorWrapper 5 +1 PGconn 5 +1 Connection 5 +1 PrepareManager 5 +1 AdaptersMap 7 +1 TypesRegistry 7 +1
with the patch the output of show_growth
is empty.
comment:7 by , 17 months ago
Cc: | added |
---|
comment:8 by , 17 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 17 months ago
Throwing gc.collect() actually get it, but it take some milliseconds, even if the underlying database connection is already closed.
with "actually get it" do you mean that it cleans it up properly? If yes there does not seem to be real leak here since garbage collection usually runs every now and then (or do I miss something). It might still be beneficial to improve the performance here but that might require a test or two :D
comment:10 by , 16 months ago
Has patch: | set |
---|
comment:11 by , 16 months ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
Marking as needs improvement per Florian's comment.
comment:12 by , 15 months ago
Cc: | added |
---|
follow-up: 14 comment:13 by , 7 days ago
Florian, it does clean up eventually thanks to the garbage collection, but garbage collection is a last resort method of freeing unused memory.
There are two important problems with relyin on GC:
- It happens very rarely (the default thresholds are
(700, 10 10)
, which means the deepest collection cycle—generation 2—only runs every 70,000 dereferenced objects. That's very rare; in a simple Django application, that can mean a hundred requests. - It also locks the global interpreter lock and effectively freezes the entire Python process (and all its threads) until it goes through all uncollectable objects. The more objects it finds (not just the ones with reference cycles but also everything else they hold a reference to), the longer the freeze. In extreme cases, it can randomly add seconds to your response times.
Aggregation of garbage leads to memory fragmentation as unused memory cannot be reused until the garbage collection thresholds are met, which leads to the overall memory use slowly creeping up.
What makes it worse is that in this case, it holds a reference to a psycopg
connection object, holding all of the psycopg internals hostage and preventing proper finalization of the connection (and, I assume, in Django 5.x also preventing the connection from being reclaimed by the connection pool? I did not verify that).
comment:14 by , 7 days ago
Hi Patryk, I agree which is why this ticket is accepted. It just needs a PR with tests (or an explanation why tests make no sense, are not possible) and then we can see what we can do about that.
What makes it worse is that in this case, it holds a reference to a
psycopg
connection object, holding all of the psycopg internals hostage and preventing proper finalization of the connection (and, I assume, in Django 5.x also preventing the connection from being reclaimed by the connection pool? I did not verify that).
I don't think so, the connection is not returned to the pool on GC but rather when .close
is called on it at the end of the request: https://github.com/django/django/blob/5da3ad7bf90fba7321f4c2834db44aa920c70bc7/django/db/backends/postgresql/base.py#L391
comment:16 by , 4 days ago
Cc: | removed |
---|
comment:17 by , 4 days ago
Thinking more about this ticket (please note that I am thinking out loud here, so let me know if take a wrong turn somewhere): DatabaseWrapper
is in a standard setup *not* supposed to get garbage collected at all. Each thread will have one wrapper per defined database stored in a thread local (ConnectionHandler
/BaseConnectionHandler
). So unless the "web"server is spawning a new thread per connection this should not be a problem (I could see it being a problem for runserver but that is for dev only anyways).
So to the people having issues here: Where exactly are you seeing this, in job queues (celery etc) or in your webservers (which exactly). Fábio mentioned seeing it on uvicorn / daphne, is this also a problem for gunicorn (I know that one usually has a fixed worker & thread count there)?
Please note that I am not saying that there is no problem hidden here, the cyclic reference is certainly ugly. But if the object isn't ment to get garbage collected, it is not the end of the world. That said I would certainly prefer a solution that doesn't use a weakref proxy (we would need to do some profiling on the overhead).
comment:18 by , 33 hours ago
Florian, we're using ASGI, so I believe it's very much spawning and collecting threads left and right. The database wrapper-related objects come up in garbage lists every few requests with uvicorn
in production mode.
comment:19 by , 32 hours ago
Mhm yeah I recommended to switch to a fixed thread pool at some point iirc :D Though even with a fixed thread pool I think that the "thread/asgi" local might be per request then (ie async context id) which would show this problem. Thank you for confirming that you are using ASGI.
comment:20 by , 29 hours ago
Yeah, using a (cappable!) thread pool would be amazing, especially if the async ORM API was also able to delegate to a request-specific DB thread instead of mixing everything in a single shared thread.
Fabio, thanks for the report. Is there a reason to open a new ticket and don't add your findings in the #33497? As far as I'm aware, this is a duplicate.