#33497 closed New feature (fixed)
Document that database persistent connections do not work with ASGI nor async mode
Reported by: | Stenkar | Owned by: | Carlton Gibson |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | ASGI, Database, async |
Cc: | Sarah Boyce, Carlton Gibson, Florian Apolloner, Andrew Godwin, Anders Kaseorg, Patryk Zawadzki, Mikail, Alex, joeli, Marco Glauser, Rafał Pitoń, Marty Cochrane, lappu, Dmytro Litvinov, Suraj Shaw, Yiwei Gao | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello!
I've discovered that after upgrading Django to ver. 4 (currently 4.0.2), I started to see database FATAL: sorry, too many clients already errors in the Sentry.
For a database, I'm using containerized Postgres 14.1 and the connection between Django and Postgres is done by Unix socket.
Database settings look like this:
DATABASES = { "default": { "ENGINE": "django.db.backends.postgresql", "NAME": environ.get("POSTGRES_DB"), "USER": environ.get("POSTGRES_USER"), "PASSWORD": environ.get("POSTGRES_PASSWORD"), "HOST": environ.get("POSTGRES_HOST"), "PORT": environ.get("POSTGRES_PORT"), "CONN_MAX_AGE": 3600 } }
In production, I'm using ASGI (Uvicorn 0.17.4) to run the Django application (4 workers).
When everything is deployed and I have surfed around the Django admin site, then checking Postgres active connections, using SELECT * FROM pg_stat_activity; command, I see that there are 30+ Idle connections made from Django.
After surfing more around the admin site, I can see that more Idle connections have been made by Django.
It looks like the database connections are not reused. At one point some of the Idle connections are closed, but then again more connections have been made when more DB queries are made by Django.
I have one Django 3.2.11 project running on production and all the settings are the same, there are always max 10 persistent connections with the database and everything works fine.
Should that be like this in version 4.0?
Change History (57)
comment:1 by , 3 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:2 by , 3 years ago
Summary: | Database persistent connection do not work with ASGI in 4.0 → Database persistent connections do not work with ASGI in 4.0 |
---|
comment:3 by , 3 years ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
comment:4 by , 3 years ago
Cc: | added |
---|
Hi Stenkar.
Would you be able to put together a minimal test-project here, so that folks can reproduce quickly.
This may be due to Django 4.0 having per-request contexts for the thread sensitivity of sync_to_async()
— See #32889.
If so, that's kind-of a good thing, in that too many open resources is what you'd expect in async code, and up to now, we've not been hitting that, as we've essentially been running serially.
Immediate thought for a mitigation would be to use a connection pool.
Equally, can we limit the number of threads in play using asgiref's `AGSI_THREADS` environment variable? (But see the discussion on the related Daphne issue about whether that's the right place for that at all.)
This is likely a topic we'll need to deal with (eventually) in Django: once you start getting async working, you soon hit resource limits, and handling that with structures for sequencing and maximum parallelism is one of those hard-batteries™ that we maybe should provide. 🤔
comment:5 by , 3 years ago
I think https://github.com/django/asgiref/pull/306#issuecomment-991959863 might play into this as well. By using a single thread per connection, persistent connections will never get clean up.
EDIT:// By Design I do not think that persistent connections currently work in Django's async mode. I think this should be reproducible with any async view that does a database connection.
comment:6 by , 3 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Thank you for the comments.
I created a project where it's possible to spin up a minimal project with docker-compose.
https://github.com/KStenK/django-ticket-33497
I'm not sure that I can find where or what goes wrong in more detail, but I'll give it a try.
comment:7 by , 3 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Bug → New feature |
OK, thanks Stenkar.
I'm going to accept this as a New Feature. It's a change in behaviour from 3.2, but it's precisely in allowing multiple executors for sync_to_async()
that it comes up. (In 3.2 it's essentially single-threaded, with only a single connection actually being used.) We need to improve the story here, but it's not a bug in #32889 that we don't have async compatible persistent DB connections yet. (I hope that makes sense.)
A note to the docs about this limitation may be worthwhile.
comment:8 by , 3 years ago
Thinking more about this I do not think the problem is new. We have the same problem when persistent connections are used and a new thread is generated per request (for instance in runserver.py). Usually (ie with gunicorn etc) one has a rather "stable" pool of processes or requests; as soon as you switch to new threads per connection this will break. In ASGI this behavior is probably more pronounced since by definition every request is in it's own async task context which then propagates down to the db backend as new connection per request (which in turn will also never reuse connections because the "thread" ids change).
All in all I think we are finally at the point where we need a connection pool in Django. I'd strongly recommend to use something like https://github.com/psycopg/psycopg/tree/master/psycopg_pool/psycopg_pool but abstracted to work for all databases in Django.
comment:10 by , 3 years ago
Cc: | added |
---|
comment:11 by , 3 years ago
Cc: | added |
---|
follow-up: 15 comment:12 by , 3 years ago
This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.
The request_finished
signal does not terminate them as they are not idle for longer than MAX_CONN_AGE
.
The request_started
signal does not terminate them as it never sees those connections due to the connection state being asgiref.local
and discarded after every request.
Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if MAX_CONN_AGE
is combined with ASGI.
comment:13 by , 2 years ago
Keywords: | async added |
---|
comment:14 by , 2 years ago
Cc: | added |
---|
comment:15 by , 2 years ago
Replying to Patryk Zawadzki:
This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.
The
request_finished
signal does not terminate them as they are not idle for longer thanMAX_CONN_AGE
.
The
request_started
signal does not terminate them as it never sees those connections due to the connection state beingasgiref.local
and discarded after every request.
Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if
MAX_CONN_AGE
is combined with ASGI.
I agree. I would even go as far as calling this a regression, not just an undocumented breaking change. No matter the reasons behind it or the technical superiority of the new solution, fact of the matter stands that in 3.2 ASGI mode our code worked fine and reused connections. In 4.x it is broken unless using MAX_CONN_AGE = 0
, which disables a feature in Django that used to work.
comment:16 by , 2 years ago
Cc: | added |
---|
comment:17 by , 2 years ago
Cc: | added |
---|
comment:18 by , 2 years ago
Cc: | added |
---|
comment:19 by , 2 years ago
Cc: | added |
---|
comment:20 by , 2 years ago
I have created a draft pull request for database connection pool support in postgresql: https://github.com/django/django/pull/16881
It would be great if people experiencing the problems noted here could test this (this would probably help in getting it merged).
comment:21 by , 23 months ago
Cc: | added |
---|
We just ran into this while upgrading from 3.2 to 4.2. During a QA round our staging environment MySQL server running on AWS RDS t3.micro instance exceeded its max connections (70? or so, while normally the connections stay below 10).
I git bisected the culprit to be https://github.com/django/django/commit/36fa071d6ebd18a61c4d7f1b5c9d17106134bd44, which is what Carlton Gibson suspected.
We are also running uvicorn.
follow-up: 23 comment:22 by , 19 months ago
We have been using CONN_MAX_AGE=300
since it was introduced in Django 1.6 and rely in it for not having to reconnect to the database in each http request. This change really caught us off guard. We upgraded from django 3.2 to 4.0 and our site went completely down in a matter of seconds when all database connections was instantly depleted.
Giving each http request its own async context makes a lot of sense and is a good change in itself IMO. But I would argue that this change is not backward compatible. CONN_MAX_AGE
does still "technically" work but it does clearly not behave as it has been doing for the last 10 years.
Specifying CONN_MAX_AGE
is recommended in a lot of places, including Django's own docs:
- https://docs.djangoproject.com/en/4.2/ref/databases/#persistent-database-connections
- https://devcenter.heroku.com/articles/python-concurrency-and-database-connections
At the very least, I think this needs to be clearly called out in the release notes and docs on "Persistent connections". I think we need to deprecate/remove CONN_MAX_AGE
. Or is there even a reason to keep it around?
I am very much in favor of getting basic db connection pooling into django. We will try to give https://github.com/django/django/pull/16881 a spin and put it in production and report back. Would love to have something like that available out of the box in Django! We use Postgres and would be happy with having such a pool which would replace CONN_MAX_AGE for our use case.
However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?
comment:23 by , 16 months ago
Replying to Andreas Pelme:
However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?
Oracle: https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html#connpooling
mysqlclient doesn't appear to support this out of the box. Looks like mysql-connector-python would have support though: https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html 🤔
comment:24 by , 16 months ago
Cc: | added |
---|
comment:25 by , 16 months ago
Has patch: | set |
---|
Will need to add solutions for other backends but don't see a reason why we can't do this incrementally - the current patch is for postgreSQL: https://github.com/django/django/pull/17594
comment:26 by , 16 months ago
Well, I guess the main question is if we want to create our own pool implementation or reuse what the database adapters provide and assume that they can do it better for their database than we can do it generically :D
comment:27 by , 16 months ago
Owner: | changed from | to
---|---|
Patch needs improvement: | set |
Status: | new → assigned |
comment:28 by , 16 months ago
Patch needs improvement: | unset |
---|
comment:29 by , 15 months ago
Cc: | added |
---|
comment:30 by , 15 months ago
There is already a ticket (and now a PR) to support database connection pools in Oracle: #7732~
Edit: Ah sorry I got too excited and misread this, session pool not connection pool .
comment:31 by , 14 months ago
As a user of Django, I was looking through the steps needed to optimize my application, and came across
It was then after a long amount of research/reading that I came across this ticket. We are planning to soon upgrade our application to use uvicorn workers and run under ASGI so that we can serve web sockets alongside our HTTP API, and thus we now have pause to perform this standard optimization due to this issue.
Questions:
- Does this issue affect all ASGI Django HTTP endpoints, even synchronous ones (e.g. no
async
, nodatabase_sync_to_async()
, etc.)? - Should the docs be updated to add a warning about the fact that persistent connections are not currently fully compatible with ASGI? I'd be happy to open a PR as it would have saved me personally a lot of time if I had discovered this earlier in my research process.
comment:32 by , 14 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:34 by , 14 months ago
Has patch: | unset |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:35 by , 13 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:36 by , 12 months ago
I have created a pull request for database connection pool support in oracle : https://github.com/django/django/pull/17834
It would be great if people experiencing the same problem could test this with oracle backend (this would probably help in getting it merged).
comment:37 by , 12 months ago
Cc: | added |
---|
follow-up: 42 comment:38 by , 9 months ago
For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?
comment:39 by , 9 months ago
Cc: | added |
---|
follow-up: 43 comment:42 by , 7 months ago
Replying to johnthagen:
For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?
It seems that the answer is no, connection pools do not yet mitigate this issue. In my testing, enabling connection pooling in an ASGI context causes connection leaks. Even with a very high max connection count and a small timeout value, I'm seeing occasional OperationalError: couldn't get a connection after X sec
errors.
follow-up: 44 comment:43 by , 7 months ago
Even with a very high max connection count and a small timeout value, I'm seeing occasional
OperationalError: couldn't get a connection after X sec
errors.
This make sense, a small timeout value increases the chance to see this error if you are already running out of connections. Whether there really is a leak or not is impossible to tell without having more information about your codebase and settings.
comment:44 by , 7 months ago
Replying to Florian Apolloner:
This make sense, a small timeout value increases the chance to see this error if you are already running out of connections.
My mistake, I meant to say "large timeout value". My containers each answer ~2 requests per second, average response time is about 400ms. It shouldn't take many connections to service this workload (even one connection should work, I think), and yet, with pool timeouts set to 30s and the pool max size set to 16, I'm seeing many such errors when pooling is enabled. Which is why it seems to me that the new pooling support hasn't solved this particular problem.
comment:45 by , 7 months ago
Ok, I'd recommend enabling pool logging (https://www.psycopg.org/psycopg3/docs/advanced/pool.html#pool-operations-logging) and see where it goes south (also check the stats https://www.psycopg.org/psycopg3/docs/advanced/pool.html#pool-stats).
comment:46 by , 5 weeks ago
Did anyone figure out what the right thing to do here with Django 5.1 and ASGI (Uvicorn) with Postgres? The above comment suggests that connection pooling doesn't work, which is super confusing. How are other folks handling this?
comment:47 by , 5 weeks ago
Your best bet for now is to disable persistent connections and deploy PgBouncer (or the equivalent for your RDBMS) between Django and your database server.
comment:48 by , 5 weeks ago
Interesting. It looks like as opposed to dseb, other people have found success with connection pooling: https://forum.djangoproject.com/t/operationalerror-sorry-too-many-clients-already/16221/34.
comment:51 by , 5 weeks ago
I think there is a bit of a missunderstanding about this ticket and what the connection pool can do and how.
First off, let me start with the initial description of the ticket, namely that persistent connections do not work with ASGI. Yes this is correct, is by design and will stay that way. The reason for this is that persistent connections as implemented in Django are basically persistent per thread identifier. With ASGI you basically get a new thread identifier per request and as such a new connection. The old connections are not getting cleaned up at all quickly resulting in an exhaustion of the connection limits.
With the connection pool you are actually able to reuse those connections because they don't have a thread affinity anymore. If not this is yet another bug that should be fixed, but I think this should work and so far noone has shown any logs or an example project to the contrary and I will also not waste my time if there is no effort in helping by providing logs etc…
What will not work though is that the connection pool will make your server magically be massively concurrently. The way connection handling in Django currently works is that the connection is checked out at the first SQL operation in a request and returned at the end of the request. So the max size of the pool basically sets a hard limit on how many request you can handle in parallel holding a connection. This is basically equivalent to using no connection pool inside Django and pgbouncer in pool_mode = session
. If you have only have a handful of views that are accessing the database and also doing a lot of work (ie the response time is long) then you probably can manually close the connection mid view which will return the connection to the pool allowing for more concurrent requests.
A small example of this might look like this ([syntax] errors to be expected):
from django.db import connection async def my_long_running_view(request): my_tasks = list(Task.objects.all()) # Force the query to execute now connection.close() # Close the connection and as such release back to the pool for task in tasks: # Note: I know that this could be done in parallel as well, but that is besides the point here await some_long_running_operation(task) # Queries here will checkout a new connection from the pool
So yes, I do think that the connection pool in Django does solve the problem of persistent connections not working in ASGI, what it doesn't do is providing a mode that behaves like pgbouncers pool_mode = transaction
. This would certainly be a worthwile improvement but given the limited feedback on pooling so far I never bothered implementing it.
comment:52 by , 5 weeks ago
Has patch: | set |
---|---|
Owner: | set to |
Status: | new → assigned |
Added a PR to clarify the docs here.
comment:53 by , 4 weeks ago
Triage Stage: | Accepted → Ready for checkin |
---|---|
Version: | 4.0 → dev |
comment:55 by , 4 weeks ago
Summary: | Database persistent connections do not work with ASGI in 4.0 → Document that database persistent connections do not work with ASGI nor async mode |
---|
Thanks for the report. Django has a routine to clean up old connections that is tied into the request-response life-cycle, so idle connections should be closed. However, I don't think you've explained the issue in enough detail to confirm a bug in Django. This can be an issue in
psycopg2
,uvicorn
, or in custom middlewares (see #31905) it's hard to say without a reproduce.Please reopen the ticket if you can debug your issue and provide details about why and where Django is at fault, or if you can provide a sample project with reproducible scenario.