Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#900 closed defect (duplicate)

[patch] Premature close of postgres connection with multiple threads

Reported by: Maniac <Maniac@…> Owned by: adrian
Component: Database layer (models, ORM) Version:
Severity: major Keywords:
Cc: Maniac@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Postgres back-end shares one connection for many requests. This is generally okay since psycopg handles this correctly. It looks like somewhere (I haven't found exact place yet) users of shared connection close it explicitly while other processes working with it. This results in other processes crash with the exception "already closed".

I'm seeing this in action on a page with many images where each image generated by a view looking into database. When running under development server everything is ok since it' single-process. But when running under Apache with each reload some of the images sometimes turn blank. Exceptions of course aren't shown inside 'img' tags but I replaced them with &lt;object data="..." type="text/plain"&gt; and got the messages :-)

I use apache2, mod_python, postgresql 7.4

For now as a workaround I just commented out whole DatabaseWrapper.close() so the connection is destroyed when Apache child is dying and killing the Interpreter (as I'm guessing). So my Postgres complains periodically about dropped connection. Is there any place where the destruction can be hooked at (I'm not familiar with WSGI and all this handlers stuff that close)?

Attachments (4)

900.1.patch (1.2 KB) - added by Maniac <Maniac@…> 10 years ago.
Preliminary patch
900.2.patch (3.8 KB) - added by Maniac <Maniac@…> 9 years ago.
Working patch
900.3.patch (3.8 KB) - added by Maniac <Maniac@…> 9 years ago.
Patch for magic-removal branch
900.4.patch (3.8 KB) - added by Maniac <Maniac@…> 9 years ago.
Updated to current magic-removal branch

Download all attachments as: .zip

Change History (16)

comment:1 Changed 10 years ago by jacob

Which Apache worker module are you using? I've seen bugs like this when using the threaded worker that didn't show up after switching to the preforking one (which seems to be faster for serving Django anyway).

comment:2 Changed 10 years ago by Maniac <Maniac@…>

Yes, threaded one.
I use it because it's the one that is bundled with Ubuntu packages (and described as faster than preforked). But people may have their reasons too.

I beleive keeping connection open would work for preforking model anyway. If, of course, problem with graceful exit can be solved :-). The other way is to adopt Eugene'f patch from ticket 463 which keeps connection list opening new connection for every cursor. But then all those necessery thread locks won't help performance....

comment:3 Changed 10 years ago by eugene@…

In reality locks don't hurt performance too much because usually underlying systems use something like a thread pool, which reuses existing "free" threads instead of killing old/creating new threads. I didn't see any measurable degradation of performance.

My biggest beef with my patch is I don't like the practice of renaming threads. It feels inelegant. But so far I didn't find an alternative solution. And it works fine with Django.

comment:4 Changed 10 years ago by Maniac <Maniac@…>

This performance degradation increases with load and much faster than lineary (more like exponentialy though I'm not sure). More threads simultaneously requiring access to some exclusive section of code lead to longer waiting for this section. And longer waiting in its turn leads to more threads waiting in the queue. Anyway it's better not to have another thread exclusion mechanism on top of psycopg connection which is already doing it.

comment:5 Changed 10 years ago by eugene@…

True. But locked area is pretty small because in most cases it reuses the connection. It locks for longer time when adding new connection. Connection reuse is even faster and virtually without locking when the RWLock is used instead of the mutex. See nash' revision of the patch: #463.

comment:6 Changed 10 years ago by Maniac <Maniac@…>

I researched mod_python docs and found a hook for child exiting. I made a preliminary patch for gracefully closing postrges connection. But! I introduced new concept in addition to db.close() -- db.cleanup(). So now:

  • db.close(): finishes db access connection on each request
  • db.cleanup(): actually a destructor for DatabaseWrapper

Adrian, Jacob, is it okay architecturally for Django? If yes I could finish a patch with these changes:

  • add cleanup() for all existing db backends as noop
  • investigate if WSGI has such thing as mod_python's register_cleanup and make use of it too

BTW, with not closing a connection on each request things seem for a good bit faster. Though I didn't measure it, just seems by looking.

Changed 10 years ago by Maniac <Maniac@…>

Preliminary patch

comment:7 Changed 9 years ago by Maniac <Maniac@…>

  • Cc Maniac@… added

I've redone the whole thing. The idea with reusing a single connection was nice performance-wise but as soon as it requires semantic changes to the backend (splitting close() into two different verbs) I understand it's not very desirable now when we that close to 1.0.

So I took the idea from that twin ticket for MySQL (#463) and have added thread-safety through django's RWLock on all shared operations (namely: cursor(), close(), rollback()). However unlike with MySQL there is no need to actually create a new connection each time since they are thread-safe themselves here. Instead I just keep list of all threads that access a connection and in db.close() the connection is closed only when last thread leaves it.

This also have a nice effect on performance when simultaneous threads still do share a connection. But it doesn't live forever and doesn't get forgot (is it English? :-) ).

I was running it some time with my monster app when a browser loads hundreds of images each accessing a db on server. Requests are always successful, connections count always successfuly goes to 0 after work and Postgres never complains of dropped connections.

Patch follows.

Changed 9 years ago by Maniac <Maniac@…>

Working patch

comment:8 Changed 9 years ago by Maniac <Maniac@…>

  • Summary changed from Premature close of postgres connection with multiple threads to [patch] Premature close of postgres connection with multiple threads

comment:9 Changed 9 years ago by Maniac <Maniac@…>

I finally got to the new magicless branch and found that renaming "db" to "connection" didn't break that much as I was afraid :-). I've made the version of my patch for the branch. It looks pretty much the same except filenames and may be some line count jiggle...

Changed 9 years ago by Maniac <Maniac@…>

Patch for magic-removal branch

comment:10 Changed 9 years ago by Maniac <Maniac@…>

  • Component changed from Admin interface to Database wrapper

Changed 9 years ago by Maniac <Maniac@…>

Updated to current magic-removal branch

comment:11 Changed 9 years ago by Maniac <Maniac@…>

Adrian, Jacob is there anything wrong with the patch that prevents it from checking in? Or is it just time?

comment:12 Changed 9 years ago by eugene@…

  • Resolution set to duplicate
  • Status changed from new to closed

Superseded by #1442

Note: See TracTickets for help on using tickets.
Back to Top