Opened 5 years ago

Last modified 5 weeks ago

#24810 new New feature

Reopen database connection automatically when no transaction is active

Reported by: Aymeric Augustin Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: depaolim@…, binary@…, Marti Raudsepp, dms-cat Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no


This is a follow-up of #15802.

When the database closes the connection, for instance because the database server was restarted, an exception will be raised the next time user code attempts to use the database. Django doesn't attempt to do anything about it.

This is expected to result in at most one 500 error for each thread that serves requests, since database connections are thread-local. This doesn't sound horrific. After all the database was restarted while the website was serving requests.

It can also cause management commands to crash, which isn't the end of the world either since management commands should fall into one of the following categories:

  • short-lived, idempotent and scheduled to run regularly
  • long-lived and supervised to run constantly
  • run manually

If the connection was in autocommit mode and an operation fails because the database has closed the connection, theoretically, it's safe the reopen the connection and retry the operation.

In that case, Django could continue to operate transparently instead of raising an exception.

This may not be easy to implement.

Change History (11)

comment:1 Changed 5 years ago by Anssi Kääriäinen

It must be noted that SQL connections can have state even when not in a transaction. These could be for example session variables, per-connection configuration (SET search_path = xxx), temporary tables etc. While Django doesn't use any of these directly, breaking Django for users who do use connection state should be avoided.

If we are going to reopen connections automatically we need some safe-guards against above problems. As we can't detect automatically when users have session state in their connections, the users should tell us about this. I think opt-out (or maybe opt-in?) flag in the connection's settings could work. Something like set OPTIONS['stateful_connection'] = True, and Django will not try to reopen failed connections.

Other than the above remark I don't see a problem with this approach. Still, without further evidence that these kinds of failures are common, I believe this will require too much complexity compared to what is gained.

comment:2 Changed 5 years ago by Marco De Paoli

Cc: depaolim@… added

comment:3 Changed 5 years ago by Sergey Dobrov

Yes, I have the same problem in the daemon tasks that access DB through django models.

I don't see a problem with the connection state, actually, as we can say to do all the necessary preparations in the connection_created signal and then newly created connection can be set up correctly?

comment:4 Changed 5 years ago by Sergey Dobrov

Cc: binary@… added

comment:5 Changed 4 years ago by Mike Charnoky

It should be noted that in the case where a daemon is accessing the DB via the ORM, if the db connection is closed for some reason (usually the db server restarting), then ALL subsequent DB read queries FAIL. At that point, the process can no longer perform any read query since the db connection is closed. I'm not concerned with retrying statements that failed during the time the db was down, but it does seem like a bug that the ORM completely fails to reconnect AT ALL. The funny thing is: if and only if the process happens to perform a DB write after the DB connection gets closed, then the ORM will open a new connection.

comment:6 Changed 4 years ago by Aymeric Augustin

Yes, these are the reasons why this ticket exists. Patches *that do not ignore transactional integrity* welcome.

comment:7 in reply to:  6 Changed 4 years ago by Sergey Dobrov

Replying to aaugustin:

Yes, these are the reasons why this ticket exists. Patches *that do not ignore transactional integrity* welcome.

So I gave it a humble try.

There is no problem in case the trouble happened inside a transaction because in case error happened and was caught by the Atomic, the connection will be closed in one of the places like this. And next transaction will work without any problem as connection will be re-established because django is now aware that the current one is not an option any more.

It's still a problem in autocommit mode though because in this case connection won't be closed and will remain in a broken state, so in case we try to obtain a cursor again, we'll get the InterfaceError exception. So the code like:

def do_query():
    return Model.objects.first()

def loop():
    while True:
            print do_query()
            print 'shit happened'

will stop working after first disconnect.

At first I tried to intercept InterfaceError at the moment we are obtaining a cursor and OperationalErorr at the execute method, but then I realized that such a problem may happen in other methods, like fetchone or anywhere else, so eventually I came to just intercept it in the DatabaseErrorWrapper and close the connection if one of those errors occured but only in case we are not inside a transaction, as this case is covered already and it may break that code. Here's the commit:

After that, that code above starts to work even after disconnect.

I have no any idea if this is the right approach, and I have no idea how to write tests for this yet. But if anyone will tell this one is a worthy approach, I'll try to think about tests too. (but advices are welcome, ofc)

comment:8 Changed 4 years ago by Sergey Dobrov

Has patch: set
Needs tests: set

comment:9 Changed 2 years ago by Marti Raudsepp

Cc: Marti Raudsepp added

comment:10 Changed 9 months ago by dms-cat

Cc: dms-cat added

comment:11 Changed 5 weeks ago by Hugo Osvaldo Barrera

I've been trying to address this with no luck.

My main problem is accurately reproducing this issue -- it seems to happen from time to time in production for me. The DB connection seems to die for some reason, and the next request fails with InterfaceError.

However, testing locally I tried to reproduce this by:

  1. Start Django and Postgres.
  2. Execute a Request (which does a DB call). It works!
  3. Soft-exit Postgres
  4. Execute a Request (which does a DB call). It still works! 😓

I tried as an alternative killing postgres in (3), but the result is the same.

As per Sentry, I'm hitting this error a few hundred times a month, so I'd really like to address this.

Any hints on how to accurately reproduce this?

Note: See TracTickets for help on using tickets.
Back to Top