Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#13214 closed (invalid)

Random OperationalError when using FastCGI (+ possible solutions)

Reported by: hcarvalhoalves Owned by: nobody
Component: Core (Other) Version: 1.1
Severity: Keywords:
Cc: hcarvalhoalves@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:


Possible solution:

Until recently I was curious to test
this on Django 1.1.1. Will this
exception be thrown again... surprise,
there it was again. It took me some
time to debug this, helpful hint was
that it only shows when (pre)forking.
So for those who getting randomly
those exceptions, I can say... fix
your code :) Ok.. seriously, there
are always few ways of doing this, so
let me firs explain where is a
problem first. If you access database
when any of your modules will import
as, e.g. reading configuration from
database then you will get this error.
When your fastcgi-prefork application
starts, first it imports all modules,
and only after this forks children.
If you have established db connection
during import all children processes
will have an exact copy of that
object. This connection is being
closed at the end of request phase
(request_finished signal). So first
child which will be called to process
your request, will close this
connection. But what will happen to
the rest of the child processes? They
will believe that they have open and
presumably working connection to the
db, so any db operation will cause an
exception. Why this is not showing in
threaded execution model? I suppose
because threads are using same object
and know when any other thread is
closing connection. How to fix this?
Best way is to fix your code... but
this can be difficult sometimes.
Other option, in my opinion quite
clean, is to write somewhere in your
application small piece of code:

    from django.db import connection 
    from django.core import signals 
    def close_connection(**kwargs): 

Not ideal thought, connecting twice to the DB is a workaround at best.

Possible solution: using connection pooling (pgpool, pgbouncer), so you have DB connections pooled and stable, and handed fast to your FCGI daemons.

The problem is that this triggers another bug, psycopg2 raising an *InterfaceError* because it's trying to disconnect twice (pgbouncer already handled this).

Now the culprit is Django signal *request_finished* triggering *connection.close()*, and failing loud even if it was already disconnected. I don't think this behavior is desired, as if the request already finished, we don't care about the DB connection anymore. A patch for correcting this should be simple.

The relevant traceback:

     /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/core/handlers/ in __call__(self=<django.core.handlers.wsgi.WSGIHandler object at 0x24fb210>, environ={'AUTH_TYPE': 'Basic', 'DOCUMENT_ROOT': '/storage/test', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTPS': 'off', 'HTTP_ACCEPT': 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_AUTHORIZATION': 'Basic dGVzdGU6c3VjZXNzbw==', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': '__utma=175602209.1371964931.1269354495.126938948...none); sessionid=a1990f0d8d32c78a285489586c510e8c', 'HTTP_HOST': '', ...}, start_response=<function start_response at 0x24f87d0>)
      246                 response = self.apply_response_fixes(request, response)
      247         finally:
      248             signals.request_finished.send(sender=self.__class__)
      250         try:
    global signals = <module 'django.core.signals' from '/usr/local/l.../Django-1.1.1-py2.6.egg/django/core/signals.pyc'>, signals.request_finished = <django.dispatch.dispatcher.Signal object at 0x1975710>, signals.request_finished.send = <bound method Signal.send of <django.dispatch.dispatcher.Signal object at 0x1975710>>, sender undefined, self = <django.core.handlers.wsgi.WSGIHandler object at 0x24fb210>, self.__class__ = <class 'django.core.handlers.wsgi.WSGIHandler'>
     /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/dispatch/ in send(self=<django.dispatch.dispatcher.Signal object at 0x1975710>, sender=<class 'django.core.handlers.wsgi.WSGIHandler'>, **named={})
      165         for receiver in self._live_receivers(_make_id(sender)):
      166             response = receiver(signal=self, sender=sender, **named)
      167             responses.append((receiver, response))
      168         return responses
    response undefined, receiver = <function close_connection at 0x197b050>, signal undefined, self = <django.dispatch.dispatcher.Signal object at 0x1975710>, sender = <class 'django.core.handlers.wsgi.WSGIHandler'>, named = {}
     /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/ in close_connection(**kwargs={'sender': <class 'django.core.handlers.wsgi.WSGIHandler'>, 'signal': <django.dispatch.dispatcher.Signal object at 0x1975710>})
       63 # when a Django request is finished.
       64 def close_connection(**kwargs):
       65     connection.close()
       66 signals.request_finished.connect(close_connection)
    global connection = <django.db.backends.postgresql_psycopg2.base.DatabaseWrapper object at 0x17b14c8>, connection.close = <bound method DatabaseWrapper.close of <django.d...ycopg2.base.DatabaseWrapper object at 0x17b14c8>>
     /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/backends/ in close(self=<django.db.backends.postgresql_psycopg2.base.DatabaseWrapper object at 0x17b14c8>)
       74     def close(self):
       75         if self.connection is not None:
       76             self.connection.close()
       77             self.connection = None
    self = <django.db.backends.postgresql_psycopg2.base.DatabaseWrapper object at 0x17b14c8>, self.connection = <connection object at 0x1f80870; dsn: ' port=6432 user=postgres', closed: 2>, self.connection.close = <built-in method close of psycopg2._psycopg.connection object at 0x1f80870>

Exception handling here could add more leniency:


       63 # when a Django request is finished.
       64 def close_connection(**kwargs):
       65     connection.close()
       66 signals.request_finished.connect(close_connection)

Or it could be handled better on psycopg2, so to not throw fatal errors if all we're trying to do is disconnect and it already is:


       74     def close(self):
       75         if self.connection is not None:
       76             self.connection.close()
       77             self.connection = None

Other than that, I'm short on ideas. Not sure if this is supposed to be fixed on 1.2, maybe multi-db support refactored things and dealt with this issue. I'll try testing with a 1.2 enviro.

Change History (5)

comment:1 Changed 5 years ago by hcarvalhoalves

  • Cc hcarvalhoalves@… added
  • Component changed from Uncategorized to Core framework
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by hcarvalhoalves

In fact, pooling won't help at all. I tested a bit more, and found the offending code in my own project.

I have this middleware:

class CustomSiteMiddleware(object):
    Handle subdomains and external domains and add `custom_site` to request.
    def process_request(self, request):
        current_domain = Site.objects.get_current().domain

On daemon startup, Site.objects.get_current() causes the query to miss the cache and hit the database:

 /usr/lib/pymodules/python2.6/flup/server/ in run(self=<flup.server.fcgi_base.Request object at 0x19c8350>)
  556         """Runs the handler, flushes the streams, and ends the request."""
  557         try:
  558             protocolStatus, appStatus = self.server.handler(self)
  559         except:
  560             traceback.print_exc(file=self.stderr)
protocolStatus undefined, appStatus undefined, self = <flup.server.fcgi_base.Request object at 0x19c8350>, self.server = <flup.server.fcgi_fork.WSGIServer object at 0x19c8210>, self.server.handler = <bound method WSGIServer.handler of <flup.server.fcgi_fork.WSGIServer object at 0x19c8210>>
 /usr/lib/pymodules/python2.6/flup/server/ in handler(self=<flup.server.fcgi_fork.WSGIServer object at 0x19c8210>, req=<flup.server.fcgi_base.Request object at 0x19c8350>)
 1116         try:
 1117             try:
 1118                 result = self.application(environ, start_response)
 1119                 try:
 1120                     for data in result:
result = None, self = <flup.server.fcgi_fork.WSGIServer object at 0x19c8210>, self.application = <django.core.handlers.wsgi.WSGIHandler object at 0x19c81d0>, environ = {'AUTH_TYPE': 'Basic', 'DOCUMENT_ROOT': '/storage/test', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTPS': 'off', 'HTTP_ACCEPT': 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_AUTHORIZATION': 'Basic dGVzdGU6c3VjZXNzbw==', 'HTTP_CACHE_CONTROL': 'max-age=0', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': '__utma=175602209.1371964931.1269354495.126938948...none); sessionid=a1990f0d8d32c78a285489586c510e8c', ...}, start_response = <function start_response at 0x19c47d0>
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/core/handlers/ in __call__(self=<django.core.handlers.wsgi.WSGIHandler object at 0x19c81d0>, environ={'AUTH_TYPE': 'Basic', 'DOCUMENT_ROOT': '/storage/test', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTPS': 'off', 'HTTP_ACCEPT': 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_AUTHORIZATION': 'Basic dGVzdGU6c3VjZXNzbw==', 'HTTP_CACHE_CONTROL': 'max-age=0', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': '__utma=175602209.1371964931.1269354495.126938948...none); sessionid=a1990f0d8d32c78a285489586c510e8c', ...}, start_response=<function start_response at 0x19c47d0>)
  239                 response = http.HttpResponseBadRequest()
  240             else:
  241                 response = self.get_response(request)
  243                 # Apply response middleware
response undefined, self = <django.core.handlers.wsgi.WSGIHandler object at 0x19c81d0>, self.get_response = <bound method WSGIHandler.get_response of <djang...e.handlers.wsgi.WSGIHandler object at 0x19c81d0>>, request = <WSGIRequest GET:<QueryDict: {}>, POST:<': 'http', 'wsgi.version': (1, 0)}>
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/core/handlers/ in get_response(self=<django.core.handlers.wsgi.WSGIHandler object at 0x19c81d0>, request=<WSGIRequest GET:<QueryDict: {}>, POST:<': 'http', 'wsgi.version': (1, 0)}>)
   71         # Apply request middleware
   72         for middleware_method in self._request_middleware:
   73             response = middleware_method(request)
   74             if response:
   75                 return response
response = None, middleware_method = <bound method CustomSiteMiddleware.process_reque...leware.CustomSiteMiddleware object at 0x19d57d0>>, request = <WSGIRequest GET:<QueryDict: {}>, POST:<': 'http', 'wsgi.version': (1, 0)}>
 /storage/test/colibri/website/ in process_request(self=< object at 0x19d57d0>, request=<WSGIRequest GET:<QueryDict: {}>, POST:<': 'http', 'wsgi.version': (1, 0)}>)
   11     """
   12     def process_request(self, request):
   13         current_domain = Site.objects.get_current().domain
   15         if request.subdomain:
current_domain undefined, global Site = <class 'django.contrib.sites.models.Site'>, Site.objects = <django.contrib.sites.models.SiteManager object at 0x1515e10>, Site.objects.get_current = <bound method SiteManager.get_current of <django...ib.sites.models.SiteManager object at 0x1515e10>>, ).domain undefined
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/contrib/sites/ in get_current(self=<django.contrib.sites.models.SiteManager object at 0x1515e10>)
   20             current_site = SITE_CACHE[sid]
   21         except KeyError:
   22             current_site = self.get(pk=sid)
   23             SITE_CACHE[sid] = current_site
   24         return current_site
current_site undefined, self = <django.contrib.sites.models.SiteManager object at 0x1515e10>, self.get = <bound method SiteManager.get of <django.contrib.sites.models.SiteManager object at 0x1515e10>>, pk undefined, sid = 1
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/models/ in get(self=<django.contrib.sites.models.SiteManager object at 0x1515e10>, *args=(), **kwargs={'pk': 1})
  119     def get(self, *args, **kwargs):
  120         return self.get_query_set().get(*args, **kwargs)
  122     def get_or_create(self, **kwargs):
self = <django.contrib.sites.models.SiteManager object at 0x1515e10>, self.get_query_set = <bound method SiteManager.get_query_set of <djan...ib.sites.models.SiteManager object at 0x1515e10>>, ).get undefined, args = (), kwargs = {'pk': 1}
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/models/ in get(self=[<Site:>], *args=(), **kwargs={'pk': 1})
  298         """
  299         clone = self.filter(*args, **kwargs)
  300         num = len(clone)
  301         if num == 1:
  302             return clone._result_cache[0]
num undefined, builtin len = <built-in function len>, clone = [<Site:>]
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/models/ in __len__(self=[<Site:>])
   79                 self._result_cache = list(self._iter)
   80             else:
   81                 self._result_cache = list(self.iterator())
   82         elif self._iter:
   83             self._result_cache.extend(list(self._iter))
self = [<Site:>], self._result_cache = None, builtin list = <type 'list'>, self.iterator = <bound method QuerySet.iterator of [<Site:>]>
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/models/ in iterator(self=[<Site:>])
  236             model_cls = deferred_class_factory(self.model, skip)
  238         for row in self.query.results_iter():
  239             if fill_cache:
  240                 obj, _ = get_cached_row(self.model, row,
row undefined, self = [<Site:>], self.query = <django.db.models.sql.query.BaseQuery object at 0x19dad90>, self.query.results_iter = <bound method BaseQuery.results_iter of <django.db.models.sql.query.BaseQuery object at 0x19dad90>>
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/models/sql/ in results_iter(self=<django.db.models.sql.query.BaseQuery object at 0x19dad90>)
  285         resolve_columns = hasattr(self, 'resolve_columns')
  286         fields = None
  287         for rows in self.execute_sql(MULTI):
  288             for row in rows:
  289                 if resolve_columns:
rows undefined, self = <django.db.models.sql.query.BaseQuery object at 0x19dad90>, self.execute_sql = <bound method BaseQuery.execute_sql of <django.db.models.sql.query.BaseQuery object at 0x19dad90>>, global MULTI = 'multi'
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/models/sql/ in execute_sql(self=<django.db.models.sql.query.BaseQuery object at 0x19dad90>, result_type='multi')
 2367                 return
 2368         cursor = self.connection.cursor()
 2369         cursor.execute(sql, params)
 2371         if not result_type:
cursor = <django.db.backends.util.CursorDebugWrapper object at 0x19daf10>, cursor.execute = <bound method CursorDebugWrapper.execute of <dja...nds.util.CursorDebugWrapper object at 0x19daf10>>, sql = 'SELECT "django_site"."id", "django_site"."domain...e"."id" = %s ORDER BY "django_site"."domain" ASC', params = (1,)
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/backends/ in execute(self=<django.db.backends.util.CursorDebugWrapper object at 0x19daf10>, sql='SELECT "django_site"."id", "django_site"."domain...te"."id" = 1 ORDER BY "django_site"."domain" ASC', params=(1,))
   17         start = time()
   18         try:
   19             return self.cursor.execute(sql, params)
   20         finally:
   21             stop = time()
self = <django.db.backends.util.CursorDebugWrapper object at 0x19daf10>, self.cursor = <cursor object at 0x1a5e4d8; closed: 0>, self.cursor.execute = <built-in method execute of psycopg2._psycopg.cursor object at 0x1a5e4d8>, sql = 'SELECT "django_site"."id", "django_site"."domain...te"."id" = 1 ORDER BY "django_site"."domain" ASC', params = (1,)
<class 'psycopg2.OperationalError'>: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. 
      args = ('server closed the connection unexpectedly\n\tThis ...ormally\n\tbefore or while processing the request.\n',) 
      cursor = None 
      message = 'server closed the connection unexpectedly\n\tThis ...ormally\n\tbefore or while processing the request.\n' 
      pgcode = None 
      pgerror = None

Does that mean that is impossible to query the database from a middleware when using FastCGI + prefork? Makes no sense, the deployment method shouldn't trigger bugs. Looking forward for any possible solution or workaround.

comment:3 Changed 5 years ago by hcarvalhoalves

Working in threaded mode (./ runfcgi method=threaded) seems to solve the problem, at least I'm unable to reproduce anymore. For the record I'm using connection pooling. Is it safe to deploy Django in threaded mode? Doesn't it impact performance because of the GIL?

If there's a solution for using in prefork mode too, would be invaluable.

For others struggling with this like me, my setup:

  • Cherokee Web Server (0.44.19)
  • PostgreSQL 8.3
  • pgbouncer (set for 'client' reset mode)
  • Django 1.1

Daemon command line:

python runfcgi protocol=fcgi method=threaded

comment:4 Changed 5 years ago by russellm

  • Resolution set to invalid
  • Status changed from new to closed

Ok, I've read this ticket a couple of times now, and I have no clue what's going on. I don't mean I'm having problems working out the cause of the bug - I'm having problems *understanding what the bug is*.

This report contains a 3 (maybe 4) "solutions". It contains links to mailing lists. It contains snippets of rambling run-on sentences. It contains multiple page stack traces. It contains speculation that the problem might be fixed in 1.2/trunk.

The one thing it doesn't contain is 'a description of the problem, and the conditions under which it will be experienced'. The only hints we have are in the ticket title, and they're not especially illuminating.

I'm closing this one Invalid as a procedural "No. Seriously. WTF". Assuming that you have actually found a legitimate problem (and it's impossible to tell because of the lack of details), nobody else should be forced to spelunk their way through this sort of sprawling ramble to work out what the hell this report is actually reporting.

If you think this is actually a genuine issue, start again with a clean ticket. The ticket description should be a 'clear, concise description of the problem, and the conditions under which it can be observed'. It should contain 'zero' discussion of proposed solutions. That's what the ticket comments are for.

comment:5 Changed 5 years ago by hcarvalhoalves

I was just trying to provide links and discussion happening to other pages where people are discussing this issue and consolidate in one ticket for people that, like me, searched the tickets for something like this and found nothing.

No wonder no one bothers filling a ticket here, considering the warm welcome I received. Thanks.

Filling a new ticket now.

Note: See TracTickets for help on using tickets.
Back to Top