Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#11015 closed (duplicate)

django-1.0.2 does not catch the exception that database backend first close the db connection.

Reported by: qingran Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description (last modified by ramiro)

I am running django-1.0.2 by fastcgi with nginx-0.6.36. Now I found out this error

message: Unhandled Exception
An unhandled exception was thrown by the application. 

And in nginx's error log I found:

--------------------------------------------
2009/05/06 01:03:21 [error] 14785#0: *290 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 2.3.1.5, server: ui.zeuux.com, request: "POST /blog/user/qingran/post/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/fcgi.sock:", host: "mydomain", referrer: "http://ui.zeuux.com/blog/user/qingran/post/"sic/css/zeuux-com-component-editor.css HTTP/1.1", host: "mydomain", referrer: "http://mydomain/post/"

2009/05/06 01:05:14 [error] 14785#0: *368 FastCGI sent in stderr: "Traceback (most recent call last):
  File "/usr/local/lib/python2.5/site-packages/flup-1.0.1-py2.5.egg/flup/server/fcgi_base.py", line 558, in run
    protocolStatus, appStatus = self.server.handler(self)
  File "/usr/local/lib/python2.5/site-packages/flup-1.0.1-py2.5.egg/flup/server/fcgi_base.py", line 1116, in handler
    result = self.application(environ, start_response)
  File "/usr/local/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 246, in __call__
    signals.request_finished.send(sender=self.__class__)
  File "/usr/local/lib/python2.5/site-packages/django/dispatch/dispatcher.py", line 148, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/usr/local/lib/python2.5/site-packages/django/db/__init__.py", line 46, in close_connection
    connection.close()
  File "/usr/local/lib/python2.5/site-packages/django/db/backends/__init__.py", line 51, in close
    self.connection.close()
InterfaceError: connection already closed" while reading response header from upstream, client: 2.5.1.4, server: mydomain, request: "POST /post/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/fcgi.sock:", host: "mydomain", referrer: "http://mydomain/post/"
--------------------------------------------

At the same time I found out that my database backend firstly closed the connection because of the network program.

I think django should handle this exception by handler500, so I can give the vistor the error page. My enviorment is listed as follows:

  • OS: FreeBSD 6.2 release amd64
  • Python 2.5.2_2
  • Django 1.0.2
  • Database PostgreSQL 8.3.3
  • Webserver Nginx-0.6.36

Change History (14)

comment:1 Changed 6 years ago by ramiro

  • Description modified (diff)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

(wikified description)

comment:2 Changed 6 years ago by adamnelson

Has this been confirmed in django 1.1 as well? I presume lines django/db/backends/init.,py:74-77 mask this:

    def close(self):
        if self.connection is not None:
            self.connection.close()
            self.connection = None

I can't easily test so I'll leave this ticket as new.

comment:3 Changed 6 years ago by russellm

  • Resolution set to fixed
  • Status changed from new to closed

Based on adam's details, I assume this has been fixed. Please reopen if that isn't the case.

comment:4 Changed 5 years ago by gkulewski

  • Resolution fixed deleted
  • Status changed from closed to reopened

It is not fixed.

Traceback (most recent call last):
  File "//usr/lib64/python2.5/site-packages/eventlet/wsgi.py", line 269, in handle_one_response
    result = self.application(self.environ, start_response)
  File "/home/formatec/app/path/django/core/servers/basehttp.py", line 651, in __call__
    return self.application(environ, start_response)
  File "/home/formatec/app/path/django/core/handlers/wsgi.py", line 248, in __call__
    signals.request_finished.send(sender=self.__class__)
  File "/home/formatec/app/path/django/dispatch/dispatcher.py", line 166, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/formatec/app/path/django/db/__init__.py", line 65, in close_connection
    connection.close()
  File "/home/formatec/app/path/django/db/backends/__init__.py", line 76, in close
    self.connection.close()
InterfaceError: connection already closed

PostgreSQL: 8.3. Django: svn >= 1.1 < 1.2.

Happens when removing complex objects hierarchies in admin but I couldn't pinpoint the exact cause yet.

comment:5 Changed 5 years ago by russellm

  • Resolution set to worksforme
  • Status changed from reopened to closed

"Something goes wrong some of the time" isn't a useful bug report. Unless you can provide specific conditions under which the problem can be reproduced, I'm closing worksforme. If you can provide a minimal set of models, views and data, and a server setup in which this problem can be observed, feel free to reopen.

comment:6 follow-up: Changed 5 years ago by gkulewski

That was really helpful, thanks.

Good it works for you. But it does not work for others. Look above. Use google to get many more such results. Something definitively _is_ broken, it's just hard to track it down. I will post if I have more info.

As for now I know only that it is most probably related to:

  • the admin,
  • postgresql,
  • not OS specific (happens on linux and bsd at least),
  • cascading removes of complex object hierarchies in the admin (but maybe not only),
  • most probably these hierarchies have generic relations inside (may be the trigger or may not be).

For example it happens when trying to remove an user (as in django.auth User) in my application because the user is linked to gazilion of other objects that should be also deleted (since django wants to cascade deletes). It also happens sometimes when trying to remove other objects that cascade and have generic relations in the hierarchy. But again - the exception happens way after the real error showing exactly nothing useful and there are so many objects involved that I can only guess what could the trigger be.

Any help to track it down will be appreciated. If anybody else is really interested in it, that is.

comment:7 in reply to: ↑ 6 Changed 5 years ago by russellm

Replying to gkulewski:

That was really helpful, thanks.

Good it works for you. But it does not work for others. Look above. Use google to get many more such results. Something definitively _is_ broken, it's just hard to track it down.

If this were a serious or regularly occurring problem, I wouldn't need to use Google. It would be a regular topic on Django-users, django-dev, or an observable issue on any one of the *many* Django sites that I and the rest of the core team manage personally. Neither of these things are true.

I will post if I have more info.

Which is exactly what I asked you to do.

Please consider this from our point of view. We have a report of a fairly generic sounding error, that can apparently be generated under unknown conditions. I can't reproduce the error myself. There has been no evidence presented that this is anything other than a cosmetic problem (ok - sure, there is an error message, but does that mean data is being lost, or just that it's an annoying log entry?). I can't even be certain that this isn't due to a misconfiguration of the servers in question -- for example, if your database server has a timeout set much shorter than your web server's page delivery timeout, for example.

As I said - If someone is able to provide instructions for reproducing this bug -- even if only intermittently or on specific setups -- I'm only to happy to investigate further. However, there is literally *nothing* we can do until you can tell us how to reproduce the problem.

comment:8 Changed 5 years ago by gkulewski

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Still, closing obviously non-stupid bug reports that more than 3 people have with worksforme isn't the best way to solve the problem. And google shows several other such reports, IIRC some also in this trac. Maybe they should be merged or at least considered together somehow.

Anyway, could you please try to reproduce scenario when django connected to postgresql tries to delete some object and does not succeed because deleting the row would violate foreign key constraint from some other table? I think one case of such exception may come from such scenario and may mean django (or python/django pg driver) does not recover gracefully from such error. For me it should throw some exception, render 500 page and (if needed) reopen the connection.

What happens (from sql log):

LOG:  00000: statement: DELETE FROM "auth_user" WHERE "id" IN (49)
LOCATION:  exec_simple_query, postgres.c:848
LOG:  00000: duration: 0.167 ms
LOCATION:  exec_simple_query, postgres.c:1074
ERROR:  23503: update or delete on table "auth_user" violates foreign key constraint "reversion_revision_user_id_fkey" on table "reversion_revision"
DETAIL:  Key (id)=(49) is still referenced from table "reversion_revision".
LOCATION:  ri_ReportViolation, ri_triggers.c:3501
STATEMENT:  END
NOTICE:  25P01: there is no transaction in progress
LOCATION:  UserAbortTransactionBlock, xact.c:3160
STATEMENT:  ABORT
LOG:  00000: duration: 0.074 ms  statement: ABORT
LOCATION:  exec_simple_query, postgres.c:1081
LOG:  00000: disconnection: session time: 0:00:00.303 user=formatec database=formatec host=[local]
LOCATION:  log_disconnections, postgres.c:4041

Note: as of yet I have no idea how somebody managed to create non cascading fkey in django...

                                           Table "public.reversion_revision"
    Column    |           Type           |                            Modifiers                            | Description 
--------------+--------------------------+-----------------------------------------------------------------+-------------
 id           | integer                  | not null default nextval('reversion_revision_id_seq'::regclass) | 
 date_created | timestamp with time zone | not null                                                        | 
 user_id      | integer                  |                                                                 | 
 comment      | text                     |                                                                 | 
Indexes:
    "reversion_revision_pkey" PRIMARY KEY, btree (id)
    "reversion_revision_user_id" btree (user_id)
Foreign-key constraints:
    "reversion_revision_user_id_fkey" FOREIGN KEY (user_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

But most probably this isn't the only cause of such problem. But it's quite possible that other cases will also mean bad recovery from strange errors in pgsql drivers in django or in the admin.

comment:9 Changed 5 years ago by gkulewski

Another case, possibly similar and not involving reversion.

Model called URL has generic foreign key to model Resource.
Model Resource has reverse generic relation back to model URL.
Model Foo has normal foreign key to model URL.

Then I am trying to delete Resource instance (via admin). This silently (as in admin does not warn about it) cascades to URLs connected to it (/me thinks the admin should warn about it...). And this breaks because django forgot to cascade to Foo referencing that URL. Boom (a silent one, in pgsql, not reported outside by django for some strange reason). And later big boom because something didn't recover nicely from the first error. Result: application can't make any SQL queries anymore.

Thinking more about it, previously described case may be a special case of this one (not cascading correctly, reversion probably uses some generic relations too or something). And bad error recovery after errors repoted by pgsql is another problem.

comment:10 Changed 5 years ago by russellm

  • Resolution set to duplicate
  • Status changed from reopened to closed

Ok - so, now that you have provided some instructions on how to reproduce the problem, it looks like a duplicate of #12953.

comment:11 Changed 5 years ago by gkulewski

Well, I see at least the following bugs in it:

  • not recovering after db error in pg, possibly leaving whole application/thread/process unusable [critical IMHO, not duplicate of #12953],
  • hiding db errors in the admin (no traceback after the fkey violation, no 500 page, no nothing, admin tries to silently ignore the problem) [rather bad],
  • not warning about cascading via generic relations in the admin [very bad since it silently removes user data],
  • not cascading far enough in some cases (#12953).

Should I report these as separate issues?

Shouldn't this report be reopened and track progress about

  • not recovering after db error in pg, possibly leaving whole application/thread/process unusable [critical IMHO, not duplicate of #12953]?

comment:12 Changed 5 years ago by kmtracey

I've tried your scenario on current trunk, and can't recreate most of what you list above.

First, the confirmation page now does list that the associated URL model is going to be deleted. This was likely done by the fix for #6191.

Second, I don't get any hiding of DB errors. With debug on I get a debug page with full details of the error. With debug off I get a server error page.

Perhaps because of the second item, I don't see any error related to trying to close an already-closed connection. I assume that is what you mean by "not recovering after db error"? If not please explain what you mean because I'm having hard time connecting what you are reporting now with the original report.

comment:13 Changed 5 years ago by adamnelson

I agree with gkulewski. Is it in Django's ticket policy to close tickets that are confirmed problems for multiple people just because reproducibility hasn't been found yet? Django is no longer small enough to kill bugs just because the reporter doesn't have enough information to take the ticket to approval level.

I no longer have an Nginx setup, so I can't test this anymore and can't follow the ticket anymore. However, it was definitely a problem for 2 people on a well-defined environment and has been seen by many people in the wild:

http://www.google.com/search?hl=en&q=django+connection+already+closed"+while+reading+response+header+from+upstream&aq=f&aqi=&aql=&oq=

With that said, I'm pretty confident that this is a FastCGI problem either on Django's side or Nginx's - probably Nginx.

comment:14 Changed 5 years ago by kmtracey

Closing worksforme provides a clue to people who might be interested in seeing progress on the issue that something from the reporting side is necessary before any progress is going to be made. Generally that means specifics of how to recreate the problem -- since whatever specifics are already in the ticket have not been sufficient to allow whoever has closed the ticket to actually recreate the problem. Leaving such tickets open gives the mistaken impression that some forward progress might be made in the absence of such additional help.

Note: See TracTickets for help on using tickets.
Back to Top