Opened 3 years ago

Closed 4 weeks ago

Last modified 4 weeks ago

#33497 closed New feature (fixed)

Document that database persistent connections do not work with ASGI nor async mode

Reported by: Stenkar Owned by: Carlton Gibson
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: ASGI, Database, async
Cc: Sarah Boyce, Carlton Gibson, Florian Apolloner, Andrew Godwin, Anders Kaseorg, Patryk Zawadzki, Mikail, Alex, joeli, Marco Glauser, Rafał Pitoń, Marty Cochrane, lappu, Dmytro Litvinov, Suraj Shaw, Yiwei Gao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello!

I've discovered that after upgrading Django to ver. 4 (currently 4.0.2), I started to see database FATAL: sorry, too many clients already errors in the Sentry.

For a database, I'm using containerized Postgres 14.1 and the connection between Django and Postgres is done by Unix socket.

Database settings look like this:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": environ.get("POSTGRES_DB"),
        "USER": environ.get("POSTGRES_USER"),
        "PASSWORD": environ.get("POSTGRES_PASSWORD"),
        "HOST": environ.get("POSTGRES_HOST"),
        "PORT": environ.get("POSTGRES_PORT"),
        "CONN_MAX_AGE": 3600
    }
}

In production, I'm using ASGI (Uvicorn 0.17.4) to run the Django application (4 workers).

When everything is deployed and I have surfed around the Django admin site, then checking Postgres active connections, using SELECT * FROM pg_stat_activity; command, I see that there are 30+ Idle connections made from Django.

After surfing more around the admin site, I can see that more Idle connections have been made by Django.

It looks like the database connections are not reused. At one point some of the Idle connections are closed, but then again more connections have been made when more DB queries are made by Django.

I have one Django 3.2.11 project running on production and all the settings are the same, there are always max 10 persistent connections with the database and everything works fine.

Should that be like this in version 4.0?

Change History (57)

comment:1 by Stenkar, 3 years ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Stenkar, 3 years ago

Summary: Database persistent connection do not work with ASGI in 4.0Database persistent connections do not work with ASGI in 4.0

comment:3 by Mariusz Felisiak, 3 years ago

Cc: Carlton Gibson added
Resolution: needsinfo
Status: newclosed

Thanks for the report. ​Django has a routine to clean up old connections that is tied into the request-response life-cycle, so idle connections should be closed. However, I don't think you've explained the issue in enough detail to confirm a bug in Django. This can be an issue in psycopg2, uvicorn, or in custom middlewares (see #31905) it's hard to say without a reproduce.

Please reopen the ticket if you can debug your issue and provide details about why and where Django is at fault, or if you can provide a sample project with reproducible scenario.

Last edited 3 years ago by Mariusz Felisiak (previous) (diff)

comment:4 by Carlton Gibson, 3 years ago

Cc: Florian Apolloner Andrew Godwin added

Hi Stenkar.

Would you be able to put together a minimal test-project here, so that folks can reproduce quickly.

This may be due to Django 4.0 having per-request contexts for the thread sensitivity of sync_to_async() — See #32889.
If so, that's kind-of a good thing, in that too many open resources is what you'd expect in async code, and up to now, we've not been hitting that, as we've essentially been running serially.

Immediate thought for a mitigation would be to use a connection pool.

Equally, can we limit the number of threads in play using asgiref's `AGSI_THREADS` environment variable? (But see the discussion on the related Daphne issue about whether that's the right place for that at all.)

This is likely a topic we'll need to deal with (eventually) in Django: once you start getting async working, you soon hit resource limits, and handling that with structures for sequencing and maximum parallelism is one of those hard-batteries™ that we maybe should provide. 🤔

comment:5 by Florian Apolloner, 3 years ago

I think https://github.com/django/asgiref/pull/306#issuecomment-991959863 might play into this as well. By using a single thread per connection, persistent connections will never get clean up.

EDIT:// By Design I do not think that persistent connections currently work in Django's async mode. I think this should be reproducible with any async view that does a database connection.

Last edited 3 years ago by Florian Apolloner (previous) (diff)

comment:6 by Stenkar, 3 years ago

Resolution: needsinfo
Status: closednew

Thank you for the comments.

I created a project where it's possible to spin up a minimal project with docker-compose.

https://github.com/KStenK/django-ticket-33497

I'm not sure that I can find where or what goes wrong in more detail, but I'll give it a try.

comment:7 by Carlton Gibson, 3 years ago

Triage Stage: UnreviewedAccepted
Type: BugNew feature

OK, thanks Stenkar.

I'm going to accept this as a New Feature. It's a change in behaviour from 3.2, but it's precisely in allowing multiple executors for sync_to_async() that it comes up. (In 3.2 it's essentially single-threaded, with only a single connection actually being used.) We need to improve the story here, but it's not a bug in #32889 that we don't have async compatible persistent DB connections yet. (I hope that makes sense.)

A note to the docs about this limitation may be worthwhile.

comment:8 by Florian Apolloner, 3 years ago

Thinking more about this I do not think the problem is new. We have the same problem when persistent connections are used and a new thread is generated per request (for instance in runserver.py). Usually (ie with gunicorn etc) one has a rather "stable" pool of processes or requests; as soon as you switch to new threads per connection this will break. In ASGI this behavior is probably more pronounced since by definition every request is in it's own async task context which then propagates down to the db backend as new connection per request (which in turn will also never reuse connections because the "thread" ids change).

All in all I think we are finally at the point where we need a connection pool in Django. I'd strongly recommend to use something like https://github.com/psycopg/psycopg/tree/master/psycopg_pool/psycopg_pool but abstracted to work for all databases in Django.

comment:9 by Anders Kaseorg, 3 years ago

Cc: Anders Kaseorg added

Possibly related: #33625 (for memcached connections).

comment:10 by Patryk Zawadzki, 3 years ago

Cc: Patryk Zawadzki added

comment:11 by Mikail, 3 years ago

Cc: Mikail added

comment:12 by Patryk Zawadzki, 3 years ago

This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.

The request_finished signal does not terminate them as they are not idle for longer than MAX_CONN_AGE.

The request_started signal does not terminate them as it never sees those connections due to the connection state being asgiref.local and discarded after every request.

Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if MAX_CONN_AGE is combined with ASGI.

comment:13 by Carlton Gibson, 2 years ago

Keywords: async added

comment:14 by Alex, 2 years ago

Cc: Alex added

in reply to:  12 comment:15 by joeli, 2 years ago

Replying to Patryk Zawadzki:

This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.

The request_finished signal does not terminate them as they are not idle for longer than MAX_CONN_AGE.

The request_started signal does not terminate them as it never sees those connections due to the connection state being asgiref.local and discarded after every request.

Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if MAX_CONN_AGE is combined with ASGI.

I agree. I would even go as far as calling this a regression, not just an undocumented breaking change. No matter the reasons behind it or the technical superiority of the new solution, fact of the matter stands that in 3.2 ASGI mode our code worked fine and reused connections. In 4.x it is broken unless using MAX_CONN_AGE = 0, which disables a feature in Django that used to work.

comment:16 by joeli, 2 years ago

Cc: joeli added

comment:17 by Marco Glauser, 2 years ago

Cc: Marco Glauser added

comment:18 by Rafał Pitoń, 2 years ago

Cc: Rafał Pitoń added

comment:19 by Marty Cochrane, 2 years ago

Cc: Marty Cochrane added

comment:20 by Florian Apolloner, 2 years ago

I have created a draft pull request for database connection pool support in postgresql: https://github.com/django/django/pull/16881

It would be great if people experiencing the problems noted here could test this (this would probably help in getting it merged).

comment:21 by lappu, 23 months ago

Cc: lappu added

We just ran into this while upgrading from 3.2 to 4.2. During a QA round our staging environment MySQL server running on AWS RDS t3.micro instance exceeded its max connections (70? or so, while normally the connections stay below 10).

I git bisected the culprit to be https://github.com/django/django/commit/36fa071d6ebd18a61c4d7f1b5c9d17106134bd44, which is what Carlton Gibson suspected.

We are also running uvicorn.

comment:22 by Andreas Pelme, 19 months ago

We have been using CONN_MAX_AGE=300 since it was introduced in Django 1.6 and rely in it for not having to reconnect to the database in each http request. This change really caught us off guard. We upgraded from django 3.2 to 4.0 and our site went completely down in a matter of seconds when all database connections was instantly depleted.

Giving each http request its own async context makes a lot of sense and is a good change in itself IMO. But I would argue that this change is not backward compatible. CONN_MAX_AGE does still "technically" work but it does clearly not behave as it has been doing for the last 10 years.

Specifying CONN_MAX_AGE is recommended in a lot of places, including Django's own docs:

At the very least, I think this needs to be clearly called out in the release notes and docs on "Persistent connections". I think we need to deprecate/remove CONN_MAX_AGE. Or is there even a reason to keep it around?

I am very much in favor of getting basic db connection pooling into django. We will try to give https://github.com/django/django/pull/16881 a spin and put it in production and report back. Would love to have something like that available out of the box in Django! We use Postgres and would be happy with having such a pool which would replace CONN_MAX_AGE for our use case.

However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?

in reply to:  22 comment:23 by Sarah Boyce, 16 months ago

Replying to Andreas Pelme:

However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?

Oracle: https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html#connpooling
mysqlclient doesn't appear to support this out of the box. Looks like mysql-connector-python would have support though: https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html 🤔

comment:24 by Sarah Boyce, 16 months ago

Cc: Sarah Boyce added

comment:25 by Sarah Boyce, 16 months ago

Has patch: set

Will need to add solutions for other backends but don't see a reason why we can't do this incrementally - the current patch is for postgreSQL: https://github.com/django/django/pull/17594

comment:26 by Florian Apolloner, 16 months ago

Well, I guess the main question is if we want to create our own pool implementation or reuse what the database adapters provide and assume that they can do it better for their database than we can do it generically :D

comment:27 by Mariusz Felisiak, 16 months ago

Owner: changed from nobody to Sarah Boyce
Patch needs improvement: set
Status: newassigned

comment:28 by Sarah Boyce, 16 months ago

Patch needs improvement: unset

comment:29 by Dmytro Litvinov, 15 months ago

Cc: Dmytro Litvinov added

comment:30 by Sarah Boyce, 15 months ago

There is already a ticket (and now a PR) to support database connection pools in Oracle: #7732~

Edit: Ah sorry I got too excited and misread this, session pool not connection pool .

Last edited 15 months ago by Sarah Boyce (previous) (diff)

comment:31 by johnthagen, 14 months ago

As a user of Django, I was looking through the steps needed to optimize my application, and came across

It was then after a long amount of research/reading that I came across this ticket. We are planning to soon upgrade our application to use uvicorn workers and run under ASGI so that we can serve web sockets alongside our HTTP API, and thus we now have pause to perform this standard optimization due to this issue.

Questions:

  1. Does this issue affect all ASGI Django HTTP endpoints, even synchronous ones (e.g. no async, no database_sync_to_async(), etc.)?
  2. Should the docs be updated to add a warning about the fact that persistent connections are not currently fully compatible with ASGI? I'd be happy to open a PR as it would have saved me personally a lot of time if I had discovered this earlier in my research process.

comment:32 by Mariusz Felisiak, 14 months ago

Triage Stage: AcceptedReady for checkin

comment:33 by Mariusz Felisiak <felisiak.mariusz@…>, 14 months ago

In fad334e:

Refs #33497 -- Added connection pool support for PostgreSQL.

Co-authored-by: Florian Apolloner <florian@…>
Co-authored-by: Ran Benita <ran@…>

comment:34 by Mariusz Felisiak, 14 months ago

Has patch: unset
Triage Stage: Ready for checkinAccepted

comment:35 by Mariusz Felisiak, 13 months ago

Owner: Sarah Boyce removed
Status: assignednew

comment:36 by Suraj Shaw, 12 months ago

I have created a pull request for database connection pool support in oracle : https://github.com/django/django/pull/17834
It would be great if people experiencing the same problem could test this with oracle backend (this would probably help in getting it merged).

comment:37 by Suraj Shaw, 12 months ago

Cc: Suraj Shaw added

comment:38 by johnthagen, 9 months ago

For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?

comment:39 by Yiwei Gao, 9 months ago

Cc: Yiwei Gao added

comment:40 by nessita <124304+nessita@…>, 8 months ago

In 7380ac5:

Fixed #35688 -- Restored timezone and role setters to be PostgreSQL DatabaseWrapper methods.

Following the addition of PostgreSQL connection pool support in
Refs #33497, the methods for configuring the database role and timezone
were moved to module-level functions. This change prevented subclasses
of DatabaseWrapper from overriding these methods as needed, for example,
when creating wrappers for other PostgreSQL-based backends.

Thank you Christian Hardenberg for the report and to
Florian Apolloner and Natalia Bidart for the review.

Regression in fad334e1a9b54ea1acb8cce02a25934c5acfe99f.

Co-authored-by: Natalia <124304+nessita@…>

comment:41 by Natalia <124304+nessita@…>, 8 months ago

In 26c0667:

[5.1.x] Fixed #35688 -- Restored timezone and role setters to be PostgreSQL DatabaseWrapper methods.

Following the addition of PostgreSQL connection pool support in
Refs #33497, the methods for configuring the database role and timezone
were moved to module-level functions. This change prevented subclasses
of DatabaseWrapper from overriding these methods as needed, for example,
when creating wrappers for other PostgreSQL-based backends.

Thank you Christian Hardenberg for the report and to
Florian Apolloner and Natalia Bidart for the review.

Regression in fad334e1a9b54ea1acb8cce02a25934c5acfe99f.

Co-authored-by: Natalia <124304+nessita@…>

Backport of 7380ac57340653854bc2cfe0ed80298cdac6061d from main.

in reply to:  38 ; comment:42 by dseb, 7 months ago

Replying to johnthagen:

For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?

It seems that the answer is no, connection pools do not yet mitigate this issue. In my testing, enabling connection pooling in an ASGI context causes connection leaks. Even with a very high max connection count and a small timeout value, I'm seeing occasional OperationalError: couldn't get a connection after X sec errors.

in reply to:  42 ; comment:43 by Florian Apolloner, 7 months ago

Even with a very high max connection count and a small timeout value, I'm seeing occasional OperationalError: couldn't get a connection after X sec errors.

This make sense, a small timeout value increases the chance to see this error if you are already running out of connections. Whether there really is a leak or not is impossible to tell without having more information about your codebase and settings.

in reply to:  43 comment:44 by dseb, 7 months ago

Replying to Florian Apolloner:

This make sense, a small timeout value increases the chance to see this error if you are already running out of connections.

My mistake, I meant to say "large timeout value". My containers each answer ~2 requests per second, average response time is about 400ms. It shouldn't take many connections to service this workload (even one connection should work, I think), and yet, with pool timeouts set to 30s and the pool max size set to 16, I'm seeing many such errors when pooling is enabled. Which is why it seems to me that the new pooling support hasn't solved this particular problem.

Last edited 7 months ago by dseb (previous) (diff)

comment:45 by Florian Apolloner, 7 months ago

comment:46 by eardrum0402, 5 weeks ago

Did anyone figure out what the right thing to do here with Django 5.1 and ASGI (Uvicorn) with Postgres? The above comment suggests that connection pooling doesn't work, which is super confusing. How are other folks handling this?

Last edited 5 weeks ago by eardrum0402 (previous) (diff)

comment:47 by Patryk Zawadzki, 5 weeks ago

Your best bet for now is to disable persistent connections and deploy PgBouncer (or the equivalent for your RDBMS) between Django and your database server.

comment:48 by eardrum0402, 5 weeks ago

Interesting. It looks like as opposed to dseb, other people have found success with connection pooling: https://forum.djangoproject.com/t/operationalerror-sorry-too-many-clients-already/16221/34.

comment:49 by Patryk Zawadzki, 5 weeks ago

Are you using ASGI? This ticket is only relevant if you're using ASGI.

comment:50 by eardrum0402, 5 weeks ago

Yes, I am trying to get Uvicorn to work with Postgres.

comment:51 by Florian Apolloner, 5 weeks ago

I think there is a bit of a missunderstanding about this ticket and what the connection pool can do and how.

First off, let me start with the initial description of the ticket, namely that persistent connections do not work with ASGI. Yes this is correct, is by design and will stay that way. The reason for this is that persistent connections as implemented in Django are basically persistent per thread identifier. With ASGI you basically get a new thread identifier per request and as such a new connection. The old connections are not getting cleaned up at all quickly resulting in an exhaustion of the connection limits.

With the connection pool you are actually able to reuse those connections because they don't have a thread affinity anymore. If not this is yet another bug that should be fixed, but I think this should work and so far noone has shown any logs or an example project to the contrary and I will also not waste my time if there is no effort in helping by providing logs etc…

What will not work though is that the connection pool will make your server magically be massively concurrently. The way connection handling in Django currently works is that the connection is checked out at the first SQL operation in a request and returned at the end of the request. So the max size of the pool basically sets a hard limit on how many request you can handle in parallel holding a connection. This is basically equivalent to using no connection pool inside Django and pgbouncer in pool_mode = session. If you have only have a handful of views that are accessing the database and also doing a lot of work (ie the response time is long) then you probably can manually close the connection mid view which will return the connection to the pool allowing for more concurrent requests.

A small example of this might look like this ([syntax] errors to be expected):

from django.db import connection

async def my_long_running_view(request):
    my_tasks = list(Task.objects.all()) # Force the query to execute now
    connection.close() # Close the connection and as such release back to the pool
    for task in tasks: # Note: I know that this could be done in parallel as well, but that is besides the point here
        await some_long_running_operation(task)

    # Queries here will checkout a new connection from the pool

So yes, I do think that the connection pool in Django does solve the problem of persistent connections not working in ASGI, what it doesn't do is providing a mode that behaves like pgbouncers pool_mode = transaction. This would certainly be a worthwile improvement but given the limited feedback on pooling so far I never bothered implementing it.

comment:52 by Carlton Gibson, 5 weeks ago

Has patch: set
Owner: set to Carlton Gibson
Status: newassigned

comment:53 by Natalia Bidart, 4 weeks ago

Triage Stage: AcceptedReady for checkin
Version: 4.0dev

comment:54 by nessita <124304+nessita@…>, 4 weeks ago

Resolution: fixed
Status: assignedclosed

In 8713e4ae:

Fixed #33497 -- Doc'd that persistent DB connections should be disabled in ASGI and async modes.

comment:55 by Natalia Bidart, 4 weeks ago

Summary: Database persistent connections do not work with ASGI in 4.0Document that database persistent connections do not work with ASGI nor async mode

comment:56 by Natalia <124304+nessita@…>, 4 weeks ago

In c779808:

[5.2.x] Fixed #33497 -- Doc'd that persistent DB connections should be disabled in ASGI and async modes.

Backport of 8713e4ae96817a0c7be3f7a8fee25a7c7f819721 from main.

comment:57 by Natalia <124304+nessita@…>, 4 weeks ago

In ab4bb5b:

[5.1.x] Fixed #33497 -- Doc'd that persistent DB connections should be disabled in ASGI and async modes.

Backport of 8713e4ae96817a0c7be3f7a8fee25a7c7f819721 from main.

Note: See TracTickets for help on using tickets.
Back to Top