Opened 4 years ago

Last modified 6 days ago

#15802 assigned Cleanup/optimization

Django stops functioning when the database (PostgreSQL) closes the connection

Reported by: Rick.van.Hattem@… Owned by: aaugustin
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords: database, postgres, postgresql, connection, closed
Cc: dekkers, sannies, mike@…, hv@…, anssi.kaariainen@…, reinout@…, depaolim@…, aivus, pdina, rckclmbr@…, AkosLadanyi Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

In some cases (database restart, network connection lost, hard pgBouncer restart, etc...) the connection to the database is lost without giving Django a notification.

Currently Django doesn't handle that very gracefully... it just keeps trying to close an already closed connection (and gets errors because of that) so after you've somehow lost your database connection for a bit you have to restart all of your mod_wsgi processes (means executing a reload command on 5 servers for me right now) before Django starts working again.

The stacktrace:

Traceback (most recent call last):
  File "django/core/handlers/wsgi.py", line 267, in 
__call__
    signals.request_finished.send(sender=self.__class__)
  File "django/dispatch/dispatcher.py", line 162, in send
    response = receiver(signal=self, sender=sender, **named)
  File "django/db/__init__.py", line 84, in 
close_connection
    conn.close()
  File "django/db/backends/__init__.py", line 72, in 
close
    self.connection.close()
InterfaceError: connection already closed

Proposed patch, replace the close() method in django/db/backends/__init__.py so it becomes this:

def close(self):
    if self.connection is not None:
        try:
            self.connection.close()
            self.connection = None
        except InterfaceError:
            self.connection = None
            raise

I see no valid reason for giving a non-recoverable error while closing the database connection. Yes, something is wrong and yes, the exception can be shown. But looping the error is completely futile here.

Attachments (2)

15802.diff (1.6 KB) - added by Honza_Kral 4 years ago.
hack stolen from sqlalchemy
15802-2.diff (1.4 KB) - added by dekkers 4 years ago.

Download all attachments as: .zip

Change History (40)

comment:1 Changed 4 years ago by Alex

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Well it should really only do this in the case that the InterfaceError is a "connection already closed" one, I remember seeing Mike Bayer bitching that there was no programmatic way to do it, so perhaps we should steal the hack he ended up using from him.

comment:2 Changed 4 years ago by Alex

  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 4 years ago by anonymous

Although it would be cleaner to only do this in case of an connection already closed InterfaceError. I can't really think of a valid reason to keep self.connection after trying to close it.

Either it succeeded or failed, but the odds are that the connection is not usable anymore in either case. It shouldn't be much of a problem since the exception is re-raised anyway.

But a clean solution (catching this specific error silently) would be preferred indeed.

Changed 4 years ago by Honza_Kral

hack stolen from sqlalchemy

comment:5 follow-up: Changed 4 years ago by mtredinnick

  • Easy pickings unset
  • UI/UX unset

That SQLAlchemy code looks fragile in the medium-term. I tend to agree with comment 4: we make a decent attempt to close the connection and then call it "closed" anyway. Before committing something along those lines, can anybody think of a case where that is going to bite us (I can't)?

comment:6 in reply to: ↑ 5 Changed 4 years ago by Honza_Kral

  • Owner changed from nobody to Honza_Kral
  • Status changed from new to assigned

Replying to mtredinnick:

That SQLAlchemy code looks fragile in the medium-term. I tend to agree with comment 4: we make a decent attempt to close the connection and then call it "closed" anyway. Before committing something along those lines, can anybody think of a case where that is going to bite us (I can't)?

Neither can I, I have been thinking about it since I wrote that ugly hack of a patch and I cannot think of anything going wrong. We could be leaking open connections, but those should get garbage collected and even that would be better than disabling that process/thread forever (for example gunicorn can't recover from this so this can kill production alltogether).

I will wait one more day for someone to speak up and then I will commit the patch (just connection = None, not the pseudo detection).

comment:7 Changed 4 years ago by russellm

I agree with Malcolm. Checking exception messages is fragile, and I can't think of an obvious downside to just treating a closed connection as closed, especially if we log/report the fact that we tried and failed to close the connection (so the process isn't completely transparent).

comment:8 Changed 4 years ago by Rick van Hattem <Rick.van.Hattem@…>

For the record, I have had the code running in production since opening this ticket and I have not seen any problems yet.

So as far as empirical evidence goes, it is safe to commit a patch like that.

comment:9 Changed 4 years ago by Honza_Kral

  • Resolution set to fixed
  • Status changed from assigned to closed

In [16708]:

Fixed #15802 -- pyscopg2 sometimes fail to close the connection when it's already closed by the server, Thanks Rick van Hattem

comment:10 follow-up: Changed 4 years ago by dekkers

  • Cc dekkers added
  • Has patch set
  • Resolution fixed deleted
  • Status changed from closed to reopened

I'm hitting the same problem, but when getting the cursor instead of closing the connection:

  File "/home/jener-www/jener/yatemod/backend.py", line 38, in get_account_hashes
    account = Account.objects.get(username=username, domain=domain)
  File "/usr/lib/python2.6/dist-packages/django/db/models/manager.py", line 131, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 338, in get
    num = len(clone)
  File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 81, in __len__
    self._result_cache = list(self.iterator())
  File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 268, in iterator
    for row in compiler.results_iter():
  File "/usr/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 701, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/usr/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 755, in execute_sql
    cursor = self.connection.cursor()
  File "/usr/lib/python2.6/dist-packages/django/db/backends/__init__.py", line 281, in cursor
    cursor = util.CursorWrapper(self._cursor(), self)
  File "/usr/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 173, in _cursor
    cursor = self.connection.cursor()
InterfaceError: connection already closed

The problem is again that self.connection isn't None, but it isn't usable either, and we will loop over this error again and again. The attached patch will catch the exception and call close(), so that we can continue with creating a new connection.

Changed 4 years ago by dekkers

comment:11 in reply to: ↑ 10 ; follow-up: Changed 4 years ago by Honza_Kral

Replying to dekkers:

...

The problem is again that self.connection isn't None, but it isn't usable either, and we will loop over this error again and again. The attached patch will catch the exception and call close(), so that we can continue with creating a new connection.

...

And does it occur then at every request or just once? If just once I consider that fine - we wouldn't want to hide the fact that something bad happened to the connection from the user.

comment:12 in reply to: ↑ 11 Changed 4 years ago by dekkers

Replying to Honza_Kral:

And does it occur then at every request or just once? If just once I consider that fine - we wouldn't want to hide the fact that something bad happened to the connection from the user.

It happens at every request. With my patch I still get a DatabaseError in execute() once, but I agree that that is fine.

comment:13 follow-ups: Changed 3 years ago by anonymous

I'm running django/postgres on an unreliable infrastructur (All network is in principle unreliable!). Of course I am hitting this problem too. Please help me understand the state of the issue and possible workarounds.

State

  • Changeset [16708] has been committed to the trunk on 29/Aug/11 and provides an improvement but no complete fix
  • Django 1.3.1 has been released 9/Sep/11
  • Django 1.3.1 does NOT contain the fix. What a pity!
  • The second patch completes the fix but it is not committed anywhere
  • Postgres & Django without workaround or fix == no good idea

Workarounds

  • Apply the attached patches to my production systems (Somehow I don't like this!)
  • Ease the pain by adding a connection pool on each postgres client and connect locally

Solutions

  • Convince the psycopg2 guys that their interpretation of pep-0249 is wrong as it seems that the mysql guys interpreted the following different (from pep-0249):

.close()


Close the connection now (rather than whenever __del__ is
called). The connection will be unusable from this point
forward; an Error (or subclass) exception will be raised
if any operation is attempted with the connection.

  • Wait for Django 1.4

Is patching my installation the preferred solution at the moment? Is there some other workaround?

What is the 'real' solution to this issue? Should django just deal with the different behavior? Should psycopg2 change its behavior?

Since many people seem to have problems with that behavior(just google 'InterfaceError: connection already closed') it seems to be a common understanding that closing a closed connection should not raise an exception. Therefore I'd say we should open a ticket at psycopg2's issue tracker.

Do you agree? Or did I misunderstand the issue and its background?

-

Sebastian

comment:14 in reply to: ↑ 13 Changed 3 years ago by sannies

  • Cc sannies added

Replying to anonymous:
I wasn't logged in.

comment:15 follow-up: Changed 3 years ago by dekkers

The problem that is solved by my patch is that the connection is closed by the DB without Django knowing it (this happens when you for example restart the DB). Django tries to use self.connection and doesn't handle the exception. You could change the psycopg2 behaviour to automatically reconnect, but that might hide other problems, so I don't think that's a good idea.

I currently run Django 1.3.1 with both patches applied without any problems.

comment:16 in reply to: ↑ 15 Changed 3 years ago by sannies

No, I don't mean to automatically reconnect - that goes way to far. I think psycopg2 should not raise an exception if an already closed connection is closed again. That's all.

You could change the psycopg2 behaviour to automatically reconnect, but that might hide other problems, so I don't think that's a good idea.

comment:17 Changed 3 years ago by carbonXT

  • Cc mike@… added

comment:18 Changed 3 years ago by guettli

  • Cc hv@… added

comment:19 Changed 3 years ago by dekkers

Is there anything I can do to get my patch applied before 1.4 is released?

comment:20 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added
  • Patch needs improvement set

In normal HTTP serving situations the connection doesn't get stuck into closed state. When the request is finished, the connections are closed and the already committed .close() fix will remove the broken connection. I am of course talking about 1.4 here, as it has the .close() fix already applied.

However, if using the connection in a long running task, then there will be problems. In that setting, the connection is usually never closed. We should really document that you should call connection.close() after you are done with it. Actually, we should document how to safely use Django's connections in task-running setting, I don't think we have that and there are multiple potential problems in that setting (Idle in TX for example).

The attached patch doesn't apply cleanly (15802-2.diff). It doesn't apply to the base directory, and even if applied into the django/ directory, it doesn't apply cleanly. In addition, if I am reading it correctly it is downright dangerous. I think what happens is this.

obj.save() # succeeds
# connection is closed for some reason
obj.save() # Hey, ._cursor() gave me a new connection! No error raised anywhere, perhaps I am even in a different transaction...

I don't believe there to be any safe way to automatically create a new connection without the user calling .close() in between. Surprising things could happen otherwise. The only possible exception would be that when leaving transaction management the connection could be closed on error. This could also help in many situations, because usually when you get an error (possibly due to closed connection), you will rollback the transaction (leave transaction management), and thus the connection would get unstuck.

Still one helpful thing would be to return a consistent ConnectionClosedError when the connection is closed outside Django's control.

In short: I don't think there is anything to do to get this into 1.4. Too big changes required. The only solution you have is somehow detecting closed connections in your code and closing the connection manually in those cases.

comment:21 in reply to: ↑ 13 Changed 3 years ago by piro

Replying to anonymous:

  • Convince the psycopg2 guys that their interpretation of pep-0249 is wrong as it seems that the mysql guys interpreted the following different (from pep-0249):

I've tried to do that: close() should be idempotent. But the reference DBAPI test suite fails as it explicitly check that the second close() raises an exception. I've discussed about the problem in the DBSIG <http://mail.python.org/pipermail/db-sig/2011-October/005811.html>, but never got to the point to fix the test suite. I should really write to Stuart Bishop about that.

edit: done: http://code.google.com/p/acute-dbapi/issues/detail?id=3

Last edited 3 years ago by piro (previous) (diff)

comment:22 Changed 2 years ago by aaugustin

  • Status changed from reopened to new

comment:23 Changed 2 years ago by aaugustin

  • Owner changed from Honza_Kral to aaugustin
  • Status changed from new to assigned

comment:24 Changed 2 years ago by reinout

  • Cc reinout@… added

comment:25 Changed 2 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from assigned to closed
  • connection.close() was made idempotent in Django, regardless of whether this gets fixed or not in pyscopg2.
  • Django should not provide automatic reconnection within a request because that can break transaction integrity.
  • The problem of long running processes is covered by #17887.

The fix seems to work and the discussion after the ticket was reopened didn't bring up any actionnable item.

comment:26 Changed 13 months ago by depaolim

  • Cc depaolim@… added

comment:27 Changed 13 months ago by Aymeric Augustin <aymeric.augustin@…>

In 25860096f981b0b3026b38329e4b69c72b1d8db9:

Fixed #21239 -- Maintained atomicity when closing the connection.

Refs #15802 -- Reverted #7c657b24 as BaseDatabaseWrapper.close() now
has a proper "finally" clause that may need to preserve self.connection.

comment:28 Changed 13 months ago by Aymeric Augustin <aymeric.augustin@…>

In 4ea02bdb0dc6776fc29d8164e417469c9ba10f18:

[1.6.x] Fixed #21239 -- Maintained atomicity when closing the connection.

Refs #15802 -- Reverted #7c657b24 as BaseDatabaseWrapper.close() now
has a proper "finally" clause that may need to preserve self.connection.

Backport of 25860096 from master.

comment:29 Changed 13 months ago by Aymeric Augustin <aymeric.augustin@…>

In 2e42c859da42a871244aca36162a0aad2b63c02b:

[1.7.x] Fixed #21239 -- Maintained atomicity when closing the connection.

Refs #15802 -- Reverted #7c657b24 as BaseDatabaseWrapper.close() now
has a proper "finally" clause that may need to preserve self.connection.

Backport of 25860096 from master.

comment:30 Changed 8 months ago by aaugustin

#23325 was a duplicate.

comment:31 Changed 8 months ago by aaugustin

  • Resolution fixed deleted
  • Status changed from closed to new

I'm reopening this ticket because I want to think about reopening the connection automatically when no transaction is in progress.

There's probably a bunch of issues with that technique, like retrying a query that failed.

But since "connection closed" errors are a problem many users struggle with, it's worth making sure I haven't overlooked a solution.

comment:32 Changed 8 months ago by aaugustin

  • Status changed from new to assigned

comment:33 Changed 8 months ago by aivus

  • Cc aivus added

comment:34 Changed 4 months ago by pdina

  • Cc pdina added

comment:35 Changed 3 months ago by rckclmbr

  • Cc rckclmbr@… added

comment:36 Changed 8 weeks ago by baumer1122

  • Version changed from 1.3 to 1.6

I'm seeing this in a Django 1.6.8 project. The exception is:

Traceback (most recent call last):
  File "/srv/myproject/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 194, in __call__
    signals.request_started.send(sender=self.__class__)
  File "/srv/myproject/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 185, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/__init__.py", line 91, in close_old_connections
    conn.abort()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 381, in abort
    self.rollback()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 180, in rollback
    self._rollback()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 144, in _rollback
    return self.connection.rollback()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 144, in _rollback
    return self.connection.rollback()
django.db.utils.InterfaceError: connection already closed

I can reliably replicate it by running ab against the app and doing a postgresql restart in the middle of it. All future requests fail until the process is restarted. I see this in both gunicorn and uWSGI. I tried changing the lazy-apps option in uWSGI, but the issue persists.

Last edited 8 weeks ago by baumer1122 (previous) (diff)

comment:37 Changed 4 weeks ago by AkosLadanyi

  • Cc AkosLadanyi added

comment:38 Changed 6 days ago by simone

  • Type changed from Bug to Cleanup/optimization

Since 1.7 there is a test dedicated on it. See commit 2e42c859da42a871244aca36162a0aad2b63c02b

The aaugustin proposal probably should be marked as "Cleanup/optimization", I don't see the bug still open since 1.7.

Note: See TracTickets for help on using tickets.
Back to Top