Opened 14 years ago

Closed 10 years ago

#15802 closed Cleanup/optimization (fixed)

Django stops functioning when the database (PostgreSQL) closes the connection

Reported by: Rick.van.Hattem@… Owned by: Aymeric Augustin
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords: database, postgres, postgresql, connection, closed
Cc: Jeroen Dekkers, sannies, mike@…, hv@…, anssi.kaariainen@…, reinout@…, depaolim@…, Ilya Antipenko, pdina, rckclmbr@…, Akos Ladanyi Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

In some cases (database restart, network connection lost, hard pgBouncer restart, etc...) the connection to the database is lost without giving Django a notification.

Currently Django doesn't handle that very gracefully... it just keeps trying to close an already closed connection (and gets errors because of that) so after you've somehow lost your database connection for a bit you have to restart all of your mod_wsgi processes (means executing a reload command on 5 servers for me right now) before Django starts working again.

The stacktrace:

Traceback (most recent call last):
  File "django/core/handlers/wsgi.py", line 267, in 
__call__
    signals.request_finished.send(sender=self.__class__)
  File "django/dispatch/dispatcher.py", line 162, in send
    response = receiver(signal=self, sender=sender, **named)
  File "django/db/__init__.py", line 84, in 
close_connection
    conn.close()
  File "django/db/backends/__init__.py", line 72, in 
close
    self.connection.close()
InterfaceError: connection already closed

Proposed patch, replace the close() method in django/db/backends/__init__.py so it becomes this:

def close(self):
    if self.connection is not None:
        try:
            self.connection.close()
            self.connection = None
        except InterfaceError:
            self.connection = None
            raise

I see no valid reason for giving a non-recoverable error while closing the database connection. Yes, something is wrong and yes, the exception can be shown. But looping the error is completely futile here.

Attachments (2)

15802.diff (1.6 KB ) - added by Honza Král 14 years ago.
hack stolen from sqlalchemy
15802-2.diff (1.4 KB ) - added by Jeroen Dekkers 13 years ago.

Download all attachments as: .zip

Change History (42)

comment:1 by Alex Gaynor, 14 years ago

Well it should really only do this in the case that the InterfaceError is a "connection already closed" one, I remember seeing Mike Bayer bitching that there was no programmatic way to do it, so perhaps we should steal the hack he ended up using from him.

comment:2 by Alex Gaynor, 14 years ago

Triage Stage: UnreviewedAccepted

comment:4 by anonymous, 14 years ago

Although it would be cleaner to only do this in case of an connection already closed InterfaceError. I can't really think of a valid reason to keep self.connection after trying to close it.

Either it succeeded or failed, but the odds are that the connection is not usable anymore in either case. It shouldn't be much of a problem since the exception is re-raised anyway.

But a clean solution (catching this specific error silently) would be preferred indeed.

by Honza Král, 14 years ago

Attachment: 15802.diff added

hack stolen from sqlalchemy

comment:5 by Malcolm Tredinnick, 13 years ago

Easy pickings: unset
UI/UX: unset

That SQLAlchemy code looks fragile in the medium-term. I tend to agree with comment 4: we make a decent attempt to close the connection and then call it "closed" anyway. Before committing something along those lines, can anybody think of a case where that is going to bite us (I can't)?

in reply to:  5 comment:6 by Honza Král, 13 years ago

Owner: changed from nobody to Honza Král
Status: newassigned

Replying to mtredinnick:

That SQLAlchemy code looks fragile in the medium-term. I tend to agree with comment 4: we make a decent attempt to close the connection and then call it "closed" anyway. Before committing something along those lines, can anybody think of a case where that is going to bite us (I can't)?

Neither can I, I have been thinking about it since I wrote that ugly hack of a patch and I cannot think of anything going wrong. We could be leaking open connections, but those should get garbage collected and even that would be better than disabling that process/thread forever (for example gunicorn can't recover from this so this can kill production alltogether).

I will wait one more day for someone to speak up and then I will commit the patch (just connection = None, not the pseudo detection).

comment:7 by Russell Keith-Magee, 13 years ago

I agree with Malcolm. Checking exception messages is fragile, and I can't think of an obvious downside to just treating a closed connection as closed, especially if we log/report the fact that we tried and failed to close the connection (so the process isn't completely transparent).

comment:8 by Rick van Hattem <Rick.van.Hattem@…>, 13 years ago

For the record, I have had the code running in production since opening this ticket and I have not seen any problems yet.

So as far as empirical evidence goes, it is safe to commit a patch like that.

comment:9 by Honza Král, 13 years ago

Resolution: fixed
Status: assignedclosed

In [16708]:

Fixed #15802 -- pyscopg2 sometimes fail to close the connection when it's already closed by the server, Thanks Rick van Hattem

comment:10 by Jeroen Dekkers, 13 years ago

Cc: Jeroen Dekkers added
Has patch: set
Resolution: fixed
Status: closedreopened

I'm hitting the same problem, but when getting the cursor instead of closing the connection:

  File "/home/jener-www/jener/yatemod/backend.py", line 38, in get_account_hashes
    account = Account.objects.get(username=username, domain=domain)
  File "/usr/lib/python2.6/dist-packages/django/db/models/manager.py", line 131, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 338, in get
    num = len(clone)
  File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 81, in __len__
    self._result_cache = list(self.iterator())
  File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 268, in iterator
    for row in compiler.results_iter():
  File "/usr/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 701, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/usr/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 755, in execute_sql
    cursor = self.connection.cursor()
  File "/usr/lib/python2.6/dist-packages/django/db/backends/__init__.py", line 281, in cursor
    cursor = util.CursorWrapper(self._cursor(), self)
  File "/usr/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 173, in _cursor
    cursor = self.connection.cursor()
InterfaceError: connection already closed

The problem is again that self.connection isn't None, but it isn't usable either, and we will loop over this error again and again. The attached patch will catch the exception and call close(), so that we can continue with creating a new connection.

by Jeroen Dekkers, 13 years ago

Attachment: 15802-2.diff added

in reply to:  10 ; comment:11 by Honza Král, 13 years ago

Replying to dekkers:

...

The problem is again that self.connection isn't None, but it isn't usable either, and we will loop over this error again and again. The attached patch will catch the exception and call close(), so that we can continue with creating a new connection.

...

And does it occur then at every request or just once? If just once I consider that fine - we wouldn't want to hide the fact that something bad happened to the connection from the user.

in reply to:  11 comment:12 by Jeroen Dekkers, 13 years ago

Replying to Honza_Kral:

And does it occur then at every request or just once? If just once I consider that fine - we wouldn't want to hide the fact that something bad happened to the connection from the user.

It happens at every request. With my patch I still get a DatabaseError in execute() once, but I agree that that is fine.

comment:13 by anonymous, 13 years ago

I'm running django/postgres on an unreliable infrastructur (All network is in principle unreliable!). Of course I am hitting this problem too. Please help me understand the state of the issue and possible workarounds.

State

  • Changeset [16708] has been committed to the trunk on 29/Aug/11 and provides an improvement but no complete fix
  • Django 1.3.1 has been released 9/Sep/11
  • Django 1.3.1 does NOT contain the fix. What a pity!
  • The second patch completes the fix but it is not committed anywhere
  • Postgres & Django without workaround or fix == no good idea

Workarounds

  • Apply the attached patches to my production systems (Somehow I don't like this!)
  • Ease the pain by adding a connection pool on each postgres client and connect locally

Solutions

  • Convince the psycopg2 guys that their interpretation of pep-0249 is wrong as it seems that the mysql guys interpreted the following different (from pep-0249):

.close()


Close the connection now (rather than whenever __del__ is
called). The connection will be unusable from this point
forward; an Error (or subclass) exception will be raised
if any operation is attempted with the connection.

  • Wait for Django 1.4

Is patching my installation the preferred solution at the moment? Is there some other workaround?

What is the 'real' solution to this issue? Should django just deal with the different behavior? Should psycopg2 change its behavior?

Since many people seem to have problems with that behavior(just google 'InterfaceError: connection already closed') it seems to be a common understanding that closing a closed connection should not raise an exception. Therefore I'd say we should open a ticket at psycopg2's issue tracker.

Do you agree? Or did I misunderstand the issue and its background?

-

Sebastian

in reply to:  13 comment:14 by sannies, 13 years ago

Cc: sannies added

Replying to anonymous:
I wasn't logged in.

comment:15 by Jeroen Dekkers, 13 years ago

The problem that is solved by my patch is that the connection is closed by the DB without Django knowing it (this happens when you for example restart the DB). Django tries to use self.connection and doesn't handle the exception. You could change the psycopg2 behaviour to automatically reconnect, but that might hide other problems, so I don't think that's a good idea.

I currently run Django 1.3.1 with both patches applied without any problems.

in reply to:  15 comment:16 by sannies, 13 years ago

No, I don't mean to automatically reconnect - that goes way to far. I think psycopg2 should not raise an exception if an already closed connection is closed again. That's all.

You could change the psycopg2 behaviour to automatically reconnect, but that might hide other problems, so I don't think that's a good idea.

comment:17 by Mike Fogel, 13 years ago

Cc: mike@… added

comment:18 by Thomas Güttler, 13 years ago

Cc: hv@… added

comment:19 by Jeroen Dekkers, 13 years ago

Is there anything I can do to get my patch applied before 1.4 is released?

comment:20 by Anssi Kääriäinen, 13 years ago

Cc: anssi.kaariainen@… added
Patch needs improvement: set

In normal HTTP serving situations the connection doesn't get stuck into closed state. When the request is finished, the connections are closed and the already committed .close() fix will remove the broken connection. I am of course talking about 1.4 here, as it has the .close() fix already applied.

However, if using the connection in a long running task, then there will be problems. In that setting, the connection is usually never closed. We should really document that you should call connection.close() after you are done with it. Actually, we should document how to safely use Django's connections in task-running setting, I don't think we have that and there are multiple potential problems in that setting (Idle in TX for example).

The attached patch doesn't apply cleanly (15802-2.diff). It doesn't apply to the base directory, and even if applied into the django/ directory, it doesn't apply cleanly. In addition, if I am reading it correctly it is downright dangerous. I think what happens is this.

obj.save() # succeeds
# connection is closed for some reason
obj.save() # Hey, ._cursor() gave me a new connection! No error raised anywhere, perhaps I am even in a different transaction...

I don't believe there to be any safe way to automatically create a new connection without the user calling .close() in between. Surprising things could happen otherwise. The only possible exception would be that when leaving transaction management the connection could be closed on error. This could also help in many situations, because usually when you get an error (possibly due to closed connection), you will rollback the transaction (leave transaction management), and thus the connection would get unstuck.

Still one helpful thing would be to return a consistent ConnectionClosedError when the connection is closed outside Django's control.

In short: I don't think there is anything to do to get this into 1.4. Too big changes required. The only solution you have is somehow detecting closed connections in your code and closing the connection manually in those cases.

in reply to:  13 comment:21 by piro, 13 years ago

Replying to anonymous:

  • Convince the psycopg2 guys that their interpretation of pep-0249 is wrong as it seems that the mysql guys interpreted the following different (from pep-0249):

I've tried to do that: close() should be idempotent. But the reference DBAPI test suite fails as it explicitly check that the second close() raises an exception. I've discussed about the problem in the DBSIG <http://mail.python.org/pipermail/db-sig/2011-October/005811.html>, but never got to the point to fix the test suite. I should really write to Stuart Bishop about that.

edit: done: http://code.google.com/p/acute-dbapi/issues/detail?id=3

Last edited 13 years ago by piro (previous) (diff)

comment:22 by Aymeric Augustin, 12 years ago

Status: reopenednew

comment:23 by Aymeric Augustin, 12 years ago

Owner: changed from Honza Král to Aymeric Augustin
Status: newassigned

comment:24 by Reinout van Rees, 12 years ago

Cc: reinout@… added

comment:25 by Aymeric Augustin, 12 years ago

Resolution: fixed
Status: assignedclosed
  • connection.close() was made idempotent in Django, regardless of whether this gets fixed or not in pyscopg2.
  • Django should not provide automatic reconnection within a request because that can break transaction integrity.
  • The problem of long running processes is covered by #17887.

The fix seems to work and the discussion after the ticket was reopened didn't bring up any actionnable item.

comment:26 by Marco De Paoli, 11 years ago

Cc: depaolim@… added

comment:27 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 25860096f981b0b3026b38329e4b69c72b1d8db9:

Fixed #21239 -- Maintained atomicity when closing the connection.

Refs #15802 -- Reverted #7c657b24 as BaseDatabaseWrapper.close() now
has a proper "finally" clause that may need to preserve self.connection.

comment:28 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 4ea02bdb0dc6776fc29d8164e417469c9ba10f18:

[1.6.x] Fixed #21239 -- Maintained atomicity when closing the connection.

Refs #15802 -- Reverted #7c657b24 as BaseDatabaseWrapper.close() now
has a proper "finally" clause that may need to preserve self.connection.

Backport of 25860096 from master.

comment:29 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

In 2e42c859da42a871244aca36162a0aad2b63c02b:

[1.7.x] Fixed #21239 -- Maintained atomicity when closing the connection.

Refs #15802 -- Reverted #7c657b24 as BaseDatabaseWrapper.close() now
has a proper "finally" clause that may need to preserve self.connection.

Backport of 25860096 from master.

comment:30 by Aymeric Augustin, 10 years ago

#23325 was a duplicate.

comment:31 by Aymeric Augustin, 10 years ago

Resolution: fixed
Status: closednew

I'm reopening this ticket because I want to think about reopening the connection automatically when no transaction is in progress.

There's probably a bunch of issues with that technique, like retrying a query that failed.

But since "connection closed" errors are a problem many users struggle with, it's worth making sure I haven't overlooked a solution.

comment:32 by Aymeric Augustin, 10 years ago

Status: newassigned

comment:33 by Ilya Antipenko, 10 years ago

Cc: Ilya Antipenko added

comment:34 by pdina, 10 years ago

Cc: pdina added

comment:35 by Joshua Braegger, 10 years ago

Cc: rckclmbr@… added

comment:36 by Peter Baumgartner, 10 years ago

Version: 1.31.6

I'm seeing this in a Django 1.6.8 project served via uWSGI 2.0.9. The exception is:

Traceback (most recent call last):
  File "/srv/myproject/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 194, in __call__
    signals.request_started.send(sender=self.__class__)
  File "/srv/myproject/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 185, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/__init__.py", line 91, in close_old_connections
    conn.abort()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 381, in abort
    self.rollback()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 180, in rollback
    self._rollback()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 144, in _rollback
    return self.connection.rollback()
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 144, in _rollback
    return self.connection.rollback()
django.db.utils.InterfaceError: connection already closed

I can reliably replicate it by running ab against the app and doing a postgresql restart in the middle of it. All future requests fail until the uWSGI process is restarted. I tried changing the lazy-apps option in uWSGI, but the issue persists.

Version 0, edited 10 years ago by Peter Baumgartner (next)

comment:37 by Akos Ladanyi, 10 years ago

Cc: Akos Ladanyi added

comment:38 by Simone Federici, 10 years ago

Type: BugCleanup/optimization

Since 1.7 there is a test dedicated on it. See commit 2e42c859da42a871244aca36162a0aad2b63c02b

The aaugustin proposal probably should be marked as "Cleanup/optimization", I don't see the bug still open since 1.7.

in reply to:  36 comment:39 by Aymeric Augustin, 10 years ago

Replying to baumer1122:

I'm seeing this in a Django 1.6.8 project. The exception is: (...)

This is a valid bug report.

The same symptoms as the bug originally reported here can be triggered from another location.

The code where this error originates was deprecated in Django 1.6 and removed in Django 1.8:

As a consequence, this variant of the issue can be considered fixed in Django 1.8.

in reply to:  31 comment:40 by Aymeric Augustin, 10 years ago

Resolution: fixed
Status: assignedclosed

Replying to aaugustin:

I'm reopening this ticket because I want to think about reopening the connection automatically when no transaction is in progress.

Since this ticket was getting long and confusing, I've opened another ticket for this improvement: #24810.

Note: See TracTickets for help on using tickets.
Back to Top