Opened 11 years ago

Last modified 10 months ago

#20752 new Bug

Error signals are not reliable, especially when dealing with database errors

Reported by: tal@… Owned by:
Component: HTTP handling Version: dev
Severity: Normal Keywords: signals errors databaseError
Cc: tal@…, preston@…, schacki, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

In core.handlers.base, unhandled exceptions are processed as such:

            except: # Handle everything else, including SuspiciousOperation, etc.
                # Get the exception info now, in case another exception is thrown later.
                signals.got_request_exception.send(sender=self.__class__, request=request)
                response = self.handle_uncaught_exception(request, resolver, sys.exc_info())

Which delivers the error signal to the various handlers (database transaction resets, things like Sentry etc).
However, the code that dispatches signals aborts the dispatch if any of the handlers fail (to try..except block):

        for receiver in self._live_receivers(_make_id(sender)):
            response = receiver(signal=self, sender=sender, **named)
            responses.append((receiver, response))
        return responses

This is perfectly reasonable in most cases, but is problematic when dealing with top-level error handling, as this prevents the unhandled error from reaching handlers that just want to report it.

This is very noticeable in cases where "something bad" happens to the database connection, as the first handler is always the database rollback handler, which only catches DatabaseError, which excludes a lot of errors that can and do crop up in production

def _rollback_on_exception(**kwargs):
    from django.db import transaction
    for conn in connections:
        try:
            transaction.rollback_unless_managed(using=conn)
        except DatabaseError:
            pass

I think that it will be better to have the error signal dispatcher defer except raising until after all the handlers were called (or just swallow them all).
Alternatively, a different signal for error reporting is possible, but I think that's a little confusing.

Thoughts?

Attachments (2)

fix_20752.diff (999 bytes ) - added by tal@… 11 years ago.
patch
20752_patch.diff (3.3 KB ) - added by schacki 10 years ago.

Download all attachments as: .zip

Change History (21)

comment:1 by anonymous, 11 years ago

Oh, just saw that the defer thing is implemented in send_robust(). Maybe just switch it from send() to send_robust() then?

by tal@…, 11 years ago

Attachment: fix_20752.diff added

patch

comment:2 by tal@…, 11 years ago

Has patch: set

comment:3 by Shai Berger, 11 years ago

Easy pickings: set
Needs tests: set
Patch needs improvement: set

send_robust() implies (correctly) that errors are expected. But the current patch proposes to just silence them, and that seems wrong to me.

Also, if you see a real problem, can you provide a test that exemplifies it (and can be used to verify that the solution is correct)?

comment:4 by tal@…, 11 years ago

I was on the fence on send_robust(): I generally dislike mechanisms that just swallow exceptions, because it makes bugs slip through the cracks. On the other hand, some signal handlers (db rollback) are actually extremely likely to raise an exception if something bad happened to the underlying connection, which is what we were seeing.

You can simulate something like this by closing the underlying connections. You can simulate it by doing something like:

from django.db import connection
connection.connection.close()

At a view or a middleware.

Some alternatives to send_robust:
1) implement something that will aggregate the exceptions and just raise a "SignalHandlersException" container at the end. No exceptions are lost, but I question the overall value of it in this scenario.
2) Print out exceptions to stderr, which is what django actually ends up doing when the error signal handler go rogue.

I'm leaning towards option 2.

comment:5 by ianawilson, 11 years ago

I like option 2 as well, but to take it just one step further, what do you think of doing a logger.error() call with the stacktrace? Seems to me that this gives the most control and flexibility to developers.

comment:6 by tal@…, 10 years ago

shai: while I agree with you on silencing errors being wrong, it does seem like a problem with send_robust, as it's actually pretty sensible to rely on the existing "robust" dispatch.

I can also have send_robust emit a call to logger.error (as proposed by Ian) as a part of this patch, although it might be a little out of scope :)
What do you think? Right now we're working around this issue by doing some setattr magic:

def make_unhandled_signal_safe():
    from django.core import signals as core_signals
    setattr(core_signals.got_request_exception, 'send', core_signals.got_request_exception.send_robust)

and we would very much like to get rid of it :)

comment:7 by schacki, 10 years ago

Owner: changed from nobody to schacki
Status: newassigned

comment:8 by schacki, 10 years ago

Please find my patch attached. I have changed the core handler in a way such that:

  • the initial error is logged as expected, so the root cause of the trouble is visible and not covered by any other exceptions
  • all connected handlers are completely executed as expected, so no unexpected side-effects from one handler to the other
  • any exceptions from the handlers are logged to the error log, so no errors pass silently.

I have also tried to add unit tests. Unfortunately, I have not managed, to really read the logging output. Any help is appreciated.

by schacki, 10 years ago

Attachment: 20752_patch.diff added

comment:9 by schacki, 10 years ago

I have just sent this pull request: https://github.com/django/django/pull/1778

comment:10 by Preston Holmes, 10 years ago

Component: Core (Other)HTTP handling
Easy pickings: unset
Triage Stage: UnreviewedAccepted
Version: 1.5master

I agree there is some room to improve the robustness of the handler here in the case of "something bad happened".

Signal receiver's shouldn't be doing anything fancy when getting the "got_request_exception" signal (basically heeding the warning in the docstring of "handle_uncaught_exception", but the handler itself should be robust against any poorly written signal handler.

Another related place the handler could be more robust is in "handle_uncaught_exception" itself. While hopefully rare, someone could have attempted to install some fancy 500 handler which itself raises some unforseen exceptoin - so maybe a try except is needed there as well, with some logging, and then returning a minimal 500 ala:

return http.HttpResponseServerError('<h1>Server Error (500)</h1>', content_type='text/html')

Also it is not clear to me why the more robust sending of the signal step shouldn't just be rolled into handle_uncaught_exception. Those two steps occur together every single time in the handler, so I'm not sure why they need to be separate steps - sending the signal could easily be considered as "part of" handling the uncaught exception.

comment:11 by Preston Holmes, 10 years ago

Cc: preston@… added

comment:12 by schacki, 10 years ago

I have changed the code as follows:

  1. Refactored the got_request_exception handling into handle_uncaught_exception. This makes to code much cleaner and the process more transparent
  2. The inital exception, that triggered the handle_uncaught_exception is the main focus. I.e. this exception is logged first, this exception is raised if settings.DEBUG_PROPAGATE_EXCEPTIONS is True. This makes debugging and analyzing the root cause of all the trouble much easier (which was one of the reasons, why this ticket was raised), nevertheless all follow-up errors are still logged.
  3. Implemented proposal from ptone, to make the resolve500 more robust.
  4. Commented the code in handle_uncaught_exception to make the process and ideas behind it clearer.
Last edited 10 years ago by schacki (previous) (diff)

comment:13 by schacki, 10 years ago

Cc: schacki added

comment:14 by schacki, 10 years ago

Hi all, may I ask on the status here. Anything I could/should? Thanks (was not aware that I was not logged in while writing my previous message)

comment:15 by Preston Holmes, 10 years ago

schacki - there aren't any steps to do you haven't already, thank you for the patch. I myself have not been able to make time to give it another review, I hope to be able to late this week, or early next. You can always try to recruit any member of the Django community to review the patch (not just core members), more eyes is always better. Thank you for your patience.

comment:16 by ANUBHAV JOSHI, 10 years ago

The PR looks good to me. I am not marking it as RFC though, because it would be better if core-dev does that.

comment:17 by Aymeric Augustin, 10 years ago

The PR doesn't apply any more.

comment:18 by Mariusz Felisiak, 12 months ago

Owner: schacki removed
Status: assignednew

comment:19 by Ülgen Sarıkavak, 10 months ago

Cc: Ülgen Sarıkavak added
Note: See TracTickets for help on using tickets.
Back to Top