Opened 13 years ago

Closed 12 years ago

#17887 closed Bug (fixed)

postgresql_psycopg2 sometimes leaves connections "idle in transaction"

Reported by: Brodie Rao Owned by: Aymeric Augustin
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords: psycopg2 transactions
Cc: Adrian Holovaty, Anssi Kääriäinen, Aymeric Augustin, tcpip4000, mdupuis, iacobcatalin@…, gerdemb Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using the postgresql_psycopg2 backend with DB-level autocommit disabled, long-running Django processes (e.g., under Gunicorn) can leave around idle connections stuck in transactions, making things like schema migrations difficult to impossible to do without killing those connections.

This happens because by default (with autocommit disabled), psycopg2 issues a BEGIN before the connection's first query is run. If the connection is committed or rolled back, it'll issue another BEGIN before the next query. Unless you commit or roll back the connection, it'll stick around in that transaction indefinitely.

To make matters worse, if the connection is closed while it's in this idle state, it can cause problems with pgbouncer. pgbouncer will see there's a transaction in progress but that the client disconnected. The only way it can recover the connection safely is to disconnect from the server (issuing an "unclean server" error message).

By default, the ORM issues a commit after every INSERT or UPDATE made through the ORM. If the very last thing the Django process does before disconnecting is an INSERT or an UPDATE, it won't run into the pgbouncer "unclean server" problem, and the connection won't be idle in transaction (as long as another query isn't run). But not every request makes a modification to the database as its last query, so most of the time you'll run into this issue.

One workaround is to use the TransactionMiddleware, but that won't help if you're using the ORM outside of the request framework (e.g., in cron jobs, standalone scripts, or Celery tasks).

I'm not sure what a proper fix would look like (or if this even warrants a fix). I think ideally, Django would by default issue a ROLLBACK after every request. It could also issue a ROLLBACK when disconnecting from the database. At very the least, this issue should be documented.

Also note that while this is related to #16047, the fix for that issue won't resolve this one.

Change History (11)

comment:1 by Aymeric Augustin, 13 years ago

Triage Stage: UnreviewedAccepted

This issue has been "well known" for a long time but it seems it was never formally tracked.

comment:2 by Anssi Kääriäinen, 13 years ago

I think the transaction management works just as documented in this case. It is a different thing if Django's view of transaction handling is the sanest one, but I don't see a way to change that even if wanted.

There are a couple of problems with the transaction handling which relate to this ticket:

  1. Django doesn't actually know 100% of time if the connection is in transaction or not. The .dirty flag tries to represent this but I am pretty sure it misjudges some cases. It is set anytime a cursor is accessed in any way, but that might not be a 100% match to when transactions are started.
  2. Django doesn't let you commit or rollback transactions if you are not in a managed state. This means that if you just start using a connection in a shell script process, you can not commit or rollback that connection. (Well, actually you can. transaction.rollback() will throw an exception but still do a rollback). It is a question to me why rollback and commit are forbidden in unmanaged mode.
  3. It is not documented how to correctly use transactions and close connections in shell script processing. I think here the best way forward would be to have a new decorator which ensures the connection will be closed properly after the method is called. "@close_connection" or something like that. Another way would be to add a close_connection=False kwarg to all the transaction management decorators. That way a shell job would look like:
    @close_connection
    @commit_on_success
    def do_something():
        ...
    or
    @commit_on_success(close_connection=True)
    def do_something():
        ...
    

And now as long as you don't use the connection outside the do_something method you should be safe.

comment:3 by tcpip4000, 13 years ago

Cc: tcpip4000 added

comment:4 by Anssi Kääriäinen, 13 years ago

I have been thinking that an "explicit transactions only" mode would be very useful for debugging. This innocent looking code:

@commit_on_succeess
def myjob():
    for obj in SomeModel.objects.all():
        obj.xyzzy = SomeOtherModel.objects.filter(foo=obj.bar).count()
        obj.save()

can result in idle-in-tx. Why? As I found out if you have a database router making SomeOtherModel target a different database you have just opened a transaction to the other db, and commit_on_success only handles the default db. The point is, it is hard to know which transactions are opened as side-effects. It would have helped me a lot if there was an "explicit transactions only" setting. The SomeOtherModel.objects... call would have resulted in error: "Need to open an implicit transaction to db "other", but only explicit transactions allowed.".

In general, the transaction handling works pretty well in request handling situations, somewhat well in single-db batch processing, and is hard to use in multidb batch job processing.

I think in addition some more decorators would be welcome, @close_connections would be useful (this is similar to what is done at the end of request processing - all open connections are closed unconditionally). @atomic (open a new transaction if there isn't one already, create a savepoint otherwise) could be very useful, too.

There is a lot of work to do to make our transaction handling nice. The implicit transactions are a source of idle-in-tx errors, and here some debugging tools would be needed at least. In addition there are some strange oddities. My favorite is that when you are running outside managed transaction and try to commit or rollback you will get an exception telling you that this transaction isn't managed. However, the commit/rollback is done first, and only after that the exception is raised, so you get the action done and an exception.

Maybe some design discussions on django-developers would be needed here...

comment:5 by mdupuis, 13 years ago

Cc: mdupuis added

comment:6 by Catalin Iacob, 12 years ago

Cc: iacobcatalin@… added

comment:7 by gerdemb, 12 years ago

Cc: gerdemb added

comment:8 by Aymeric Augustin, 12 years ago

I think ideally, Django would by default issue a ROLLBACK after every request.

https://github.com/django/django/pull/733 does that.

comment:9 by Aymeric Augustin, 12 years ago

If we're documenting how to manage long running connections, we should also explain the consequences of a broken database connection, see comments at the bottom of #15802.

comment:10 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin
Status: newassigned

comment:11 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

Resolution: fixed
Status: assignedclosed

In 14cddf51c5f001bb426ce7f7a83fdc52c8d8aee9:

Merged branch 'database-level-autocommit'.

Fixed #2227: atomic supports nesting.
Fixed #6623: commit_manually is deprecated and atomic doesn't suffer from this defect.
Fixed #8320: the problem wasn't identified, but the legacy transaction management is deprecated.
Fixed #10744: the problem wasn't identified, but the legacy transaction management is deprecated.
Fixed #10813: since autocommit is enabled, it isn't necessary to rollback after errors any more.
Fixed #13742: savepoints are now implemented for SQLite.
Fixed #13870: transaction management in long running processes isn't a problem any more, and it's documented.
Fixed #14970: while it digresses on transaction management, this ticket essentially asks for autocommit on PostgreSQL.
Fixed #15694: atomic supports nesting.
Fixed #17887: autocommit makes it impossible for a connection to stay "idle of transaction".

Note: See TracTickets for help on using tickets.
Back to Top