Opened 3 years ago
Last modified 2 months ago
#33497 new New feature
Database persistent connections do not work with ASGI in 4.0
Reported by: | Stenkar | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | ASGI, Database, async |
Cc: | Sarah Boyce, Carlton Gibson, Florian Apolloner, Andrew Godwin, Anders Kaseorg, Patryk Zawadzki, Mikail, Alex, joeli, Marco Glauser, Rafał Pitoń, Marty Cochrane, lappu, Dmytro Litvinov, Suraj Shaw, Yiwei Gao | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello!
I've discovered that after upgrading Django to ver. 4 (currently 4.0.2), I started to see database FATAL: sorry, too many clients already errors in the Sentry.
For a database, I'm using containerized Postgres 14.1 and the connection between Django and Postgres is done by Unix socket.
Database settings look like this:
DATABASES = { "default": { "ENGINE": "django.db.backends.postgresql", "NAME": environ.get("POSTGRES_DB"), "USER": environ.get("POSTGRES_USER"), "PASSWORD": environ.get("POSTGRES_PASSWORD"), "HOST": environ.get("POSTGRES_HOST"), "PORT": environ.get("POSTGRES_PORT"), "CONN_MAX_AGE": 3600 } }
In production, I'm using ASGI (Uvicorn 0.17.4) to run the Django application (4 workers).
When everything is deployed and I have surfed around the Django admin site, then checking Postgres active connections, using SELECT * FROM pg_stat_activity; command, I see that there are 30+ Idle connections made from Django.
After surfing more around the admin site, I can see that more Idle connections have been made by Django.
It looks like the database connections are not reused. At one point some of the Idle connections are closed, but then again more connections have been made when more DB queries are made by Django.
I have one Django 3.2.11 project running on production and all the settings are the same, there are always max 10 persistent connections with the database and everything works fine.
Should that be like this in version 4.0?
Change History (45)
comment:1 by , 3 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:2 by , 3 years ago
Summary: | Database persistent connection do not work with ASGI in 4.0 → Database persistent connections do not work with ASGI in 4.0 |
---|
comment:3 by , 3 years ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
comment:4 by , 3 years ago
Cc: | added |
---|
Hi Stenkar.
Would you be able to put together a minimal test-project here, so that folks can reproduce quickly.
This may be due to Django 4.0 having per-request contexts for the thread sensitivity of sync_to_async()
— See #32889.
If so, that's kind-of a good thing, in that too many open resources is what you'd expect in async code, and up to now, we've not been hitting that, as we've essentially been running serially.
Immediate thought for a mitigation would be to use a connection pool.
Equally, can we limit the number of threads in play using asgiref's `AGSI_THREADS` environment variable? (But see the discussion on the related Daphne issue about whether that's the right place for that at all.)
This is likely a topic we'll need to deal with (eventually) in Django: once you start getting async working, you soon hit resource limits, and handling that with structures for sequencing and maximum parallelism is one of those hard-batteries™ that we maybe should provide. 🤔
comment:5 by , 3 years ago
I think https://github.com/django/asgiref/pull/306#issuecomment-991959863 might play into this as well. By using a single thread per connection, persistent connections will never get clean up.
comment:6 by , 3 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Thank you for the comments.
I created a project where it's possible to spin up a minimal project with docker-compose.
https://github.com/KStenK/django-ticket-33497
I'm not sure that I can find where or what goes wrong in more detail, but I'll give it a try.
comment:7 by , 3 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Bug → New feature |
OK, thanks Stenkar.
I'm going to accept this as a New Feature. It's a change in behaviour from 3.2, but it's precisely in allowing multiple executors for sync_to_async()
that it comes up. (In 3.2 it's essentially single-threaded, with only a single connection actually being used.) We need to improve the story here, but it's not a bug in #32889 that we don't have async compatible persistent DB connections yet. (I hope that makes sense.)
A note to the docs about this limitation may be worthwhile.
comment:8 by , 3 years ago
Thinking more about this I do not think the problem is new. We have the same problem when persistent connections are used and a new thread is generated per request (for instance in runserver.py). Usually (ie with gunicorn etc) one has a rather "stable" pool of processes or requests; as soon as you switch to new threads per connection this will break. In ASGI this behavior is probably more pronounced since by definition every request is in it's own async task context which then propagates down to the db backend as new connection per request (which in turn will also never reuse connections because the "thread" ids change).
All in all I think we are finally at the point where we need a connection pool in Django. I'd strongly recommend to use something like https://github.com/psycopg/psycopg/tree/master/psycopg_pool/psycopg_pool but abstracted to work for all databases in Django.
comment:10 by , 2 years ago
Cc: | added |
---|
comment:11 by , 2 years ago
Cc: | added |
---|
follow-up: 15 comment:12 by , 2 years ago
This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.
The request_finished
signal does not terminate them as they are not idle for longer than MAX_CONN_AGE
.
The request_started
signal does not terminate them as it never sees those connections due to the connection state being asgiref.local
and discarded after every request.
Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if MAX_CONN_AGE
is combined with ASGI.
comment:13 by , 2 years ago
Keywords: | async added |
---|
comment:14 by , 2 years ago
Cc: | added |
---|
comment:15 by , 2 years ago
Replying to Patryk Zawadzki:
This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.
The
request_finished
signal does not terminate them as they are not idle for longer thanMAX_CONN_AGE
.
The
request_started
signal does not terminate them as it never sees those connections due to the connection state beingasgiref.local
and discarded after every request.
Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if
MAX_CONN_AGE
is combined with ASGI.
I agree. I would even go as far as calling this a regression, not just an undocumented breaking change. No matter the reasons behind it or the technical superiority of the new solution, fact of the matter stands that in 3.2 ASGI mode our code worked fine and reused connections. In 4.x it is broken unless using MAX_CONN_AGE = 0
, which disables a feature in Django that used to work.
comment:16 by , 2 years ago
Cc: | added |
---|
comment:17 by , 2 years ago
Cc: | added |
---|
comment:18 by , 20 months ago
Cc: | added |
---|
comment:19 by , 19 months ago
Cc: | added |
---|
comment:20 by , 19 months ago
I have created a draft pull request for database connection pool support in postgresql: https://github.com/django/django/pull/16881
It would be great if people experiencing the problems noted here could test this (this would probably help in getting it merged).
comment:21 by , 18 months ago
Cc: | added |
---|
We just ran into this while upgrading from 3.2 to 4.2. During a QA round our staging environment MySQL server running on AWS RDS t3.micro instance exceeded its max connections (70? or so, while normally the connections stay below 10).
I git bisected the culprit to be https://github.com/django/django/commit/36fa071d6ebd18a61c4d7f1b5c9d17106134bd44, which is what Carlton Gibson suspected.
We are also running uvicorn.
follow-up: 23 comment:22 by , 15 months ago
We have been using CONN_MAX_AGE=300
since it was introduced in Django 1.6 and rely in it for not having to reconnect to the database in each http request. This change really caught us off guard. We upgraded from django 3.2 to 4.0 and our site went completely down in a matter of seconds when all database connections was instantly depleted.
Giving each http request its own async context makes a lot of sense and is a good change in itself IMO. But I would argue that this change is not backward compatible. CONN_MAX_AGE
does still "technically" work but it does clearly not behave as it has been doing for the last 10 years.
Specifying CONN_MAX_AGE
is recommended in a lot of places, including Django's own docs:
- https://docs.djangoproject.com/en/4.2/ref/databases/#persistent-database-connections
- https://devcenter.heroku.com/articles/python-concurrency-and-database-connections
At the very least, I think this needs to be clearly called out in the release notes and docs on "Persistent connections". I think we need to deprecate/remove CONN_MAX_AGE
. Or is there even a reason to keep it around?
I am very much in favor of getting basic db connection pooling into django. We will try to give https://github.com/django/django/pull/16881 a spin and put it in production and report back. Would love to have something like that available out of the box in Django! We use Postgres and would be happy with having such a pool which would replace CONN_MAX_AGE for our use case.
However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?
comment:23 by , 12 months ago
Replying to Andreas Pelme:
However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?
Oracle: https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html#connpooling
mysqlclient doesn't appear to support this out of the box. Looks like mysql-connector-python would have support though: https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html 🤔
comment:24 by , 12 months ago
Cc: | added |
---|
comment:25 by , 12 months ago
Has patch: | set |
---|
Will need to add solutions for other backends but don't see a reason why we can't do this incrementally - the current patch is for postgreSQL: https://github.com/django/django/pull/17594
comment:26 by , 12 months ago
Well, I guess the main question is if we want to create our own pool implementation or reuse what the database adapters provide and assume that they can do it better for their database than we can do it generically :D
comment:27 by , 12 months ago
Owner: | changed from | to
---|---|
Patch needs improvement: | set |
Status: | new → assigned |
comment:28 by , 11 months ago
Patch needs improvement: | unset |
---|
comment:29 by , 11 months ago
Cc: | added |
---|
comment:30 by , 11 months ago
There is already a ticket (and now a PR) to support database connection pools in Oracle: #7732~
Edit: Ah sorry I got too excited and misread this, session pool not connection pool .
comment:31 by , 10 months ago
As a user of Django, I was looking through the steps needed to optimize my application, and came across
It was then after a long amount of research/reading that I came across this ticket. We are planning to soon upgrade our application to use uvicorn workers and run under ASGI so that we can serve web sockets alongside our HTTP API, and thus we now have pause to perform this standard optimization due to this issue.
Questions:
- Does this issue affect all ASGI Django HTTP endpoints, even synchronous ones (e.g. no
async
, nodatabase_sync_to_async()
, etc.)? - Should the docs be updated to add a warning about the fact that persistent connections are not currently fully compatible with ASGI? I'd be happy to open a PR as it would have saved me personally a lot of time if I had discovered this earlier in my research process.
comment:32 by , 9 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:34 by , 9 months ago
Has patch: | unset |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:35 by , 8 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:36 by , 8 months ago
I have created a pull request for database connection pool support in oracle : https://github.com/django/django/pull/17834
It would be great if people experiencing the same problem could test this with oracle backend (this would probably help in getting it merged).
comment:37 by , 8 months ago
Cc: | added |
---|
follow-up: 42 comment:38 by , 5 months ago
For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?
comment:39 by , 4 months ago
Cc: | added |
---|
follow-up: 43 comment:42 by , 2 months ago
Replying to johnthagen:
For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?
It seems that the answer is no, connection pools do not yet mitigate this issue. In my testing, enabling connection pooling in an ASGI context causes connection leaks. Even with a very high max connection count and a small timeout value, I'm seeing occasional OperationalError: couldn't get a connection after X sec
errors.
follow-up: 44 comment:43 by , 2 months ago
Even with a very high max connection count and a small timeout value, I'm seeing occasional
OperationalError: couldn't get a connection after X sec
errors.
This make sense, a small timeout value increases the chance to see this error if you are already running out of connections. Whether there really is a leak or not is impossible to tell without having more information about your codebase and settings.
comment:44 by , 2 months ago
Replying to Florian Apolloner:
This make sense, a small timeout value increases the chance to see this error if you are already running out of connections.
My mistake, I meant to say "large timeout value". My containers each answer ~2 requests per second, average response time is about 400ms. It shouldn't take many connections to service this workload (even one connection should work, I think), and yet, with pool timeouts set to 30s and the pool max size set to 16, I'm seeing many such errors when pooling is enabled. Which is why it seems to me that the new pooling support hasn't solved this particular problem.
comment:45 by , 2 months ago
Ok, I'd recommend enabling pool logging (https://www.psycopg.org/psycopg3/docs/advanced/pool.html#pool-operations-logging) and see where it goes south (also check the stats https://www.psycopg.org/psycopg3/docs/advanced/pool.html#pool-stats).
Thanks for the report. Django has a routine to clean up old connections that is tied into the request-response life-cycle, so idle connections should be closed. However, I don't think you've explained the issue in enough detail to confirm a bug in Django. This can be an issue in
psycopg2
,uvicorn
, or in custom middlewares (see #31905) it's hard to say without a reproduce.Please reopen the ticket if you can debug your issue and provide details about why and where Django is at fault, or if you can provide a sample project with reproducible scenario.