Opened 2 years ago

Last modified 14 months ago

#20752 assigned Bug

Error signals are not reliable, especially when dealing with database errors

Reported by: tal@… Owned by: schacki
Component: HTTP handling Version: master
Severity: Normal Keywords: signals errors databaseError
Cc: tal@…, preston@…, schacki Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

In core.handlers.base, unhandled exceptions are processed as such:

            except: # Handle everything else, including SuspiciousOperation, etc.
                # Get the exception info now, in case another exception is thrown later.
                signals.got_request_exception.send(sender=self.__class__, request=request)
                response = self.handle_uncaught_exception(request, resolver, sys.exc_info())

Which delivers the error signal to the various handlers (database transaction resets, things like Sentry etc).
However, the code that dispatches signals aborts the dispatch if any of the handlers fail (to try..except block):

        for receiver in self._live_receivers(_make_id(sender)):
            response = receiver(signal=self, sender=sender, **named)
            responses.append((receiver, response))
        return responses

This is perfectly reasonable in most cases, but is problematic when dealing with top-level error handling, as this prevents the unhandled error from reaching handlers that just want to report it.

This is very noticeable in cases where "something bad" happens to the database connection, as the first handler is always the database rollback handler, which only catches DatabaseError, which excludes a lot of errors that can and do crop up in production

def _rollback_on_exception(**kwargs):
    from django.db import transaction
    for conn in connections:
        try:
            transaction.rollback_unless_managed(using=conn)
        except DatabaseError:
            pass

I think that it will be better to have the error signal dispatcher defer except raising until after all the handlers were called (or just swallow them all).
Alternatively, a different signal for error reporting is possible, but I think that's a little confusing.

Thoughts?

Attachments (2)

fix_20752.diff (999 bytes) - added by tal@… 2 years ago.
patch
20752_patch.diff (3.3 KB) - added by schacki 22 months ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 2 years ago by anonymous

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Oh, just saw that the defer thing is implemented in send_robust(). Maybe just switch it from send() to send_robust() then?

Changed 2 years ago by tal@…

patch

comment:2 Changed 2 years ago by tal@…

  • Has patch set

comment:3 Changed 2 years ago by shai

  • Easy pickings set
  • Needs tests set
  • Patch needs improvement set

send_robust() implies (correctly) that errors are expected. But the current patch proposes to just silence them, and that seems wrong to me.

Also, if you see a real problem, can you provide a test that exemplifies it (and can be used to verify that the solution is correct)?

comment:4 Changed 2 years ago by tal@…

I was on the fence on send_robust(): I generally dislike mechanisms that just swallow exceptions, because it makes bugs slip through the cracks. On the other hand, some signal handlers (db rollback) are actually extremely likely to raise an exception if something bad happened to the underlying connection, which is what we were seeing.

You can simulate something like this by closing the underlying connections. You can simulate it by doing something like:

from django.db import connection
connection.connection.close()

At a view or a middleware.

Some alternatives to send_robust:
1) implement something that will aggregate the exceptions and just raise a "SignalHandlersException" container at the end. No exceptions are lost, but I question the overall value of it in this scenario.
2) Print out exceptions to stderr, which is what django actually ends up doing when the error signal handler go rogue.

I'm leaning towards option 2.

comment:5 Changed 2 years ago by ianawilson

I like option 2 as well, but to take it just one step further, what do you think of doing a logger.error() call with the stacktrace? Seems to me that this gives the most control and flexibility to developers.

comment:6 Changed 22 months ago by tal@…

shai: while I agree with you on silencing errors being wrong, it does seem like a problem with send_robust, as it's actually pretty sensible to rely on the existing "robust" dispatch.

I can also have send_robust emit a call to logger.error (as proposed by Ian) as a part of this patch, although it might be a little out of scope :)
What do you think? Right now we're working around this issue by doing some setattr magic:

def make_unhandled_signal_safe():
    from django.core import signals as core_signals
    setattr(core_signals.got_request_exception, 'send', core_signals.got_request_exception.send_robust)

and we would very much like to get rid of it :)

comment:7 Changed 22 months ago by schacki

  • Owner changed from nobody to schacki
  • Status changed from new to assigned

comment:8 Changed 22 months ago by schacki

Please find my patch attached. I have changed the core handler in a way such that:

  • the initial error is logged as expected, so the root cause of the trouble is visible and not covered by any other exceptions
  • all connected handlers are completely executed as expected, so no unexpected side-effects from one handler to the other
  • any exceptions from the handlers are logged to the error log, so no errors pass silently.

I have also tried to add unit tests. Unfortunately, I have not managed, to really read the logging output. Any help is appreciated.

Changed 22 months ago by schacki

comment:9 Changed 22 months ago by schacki

I have just sent this pull request: https://github.com/django/django/pull/1778

comment:10 Changed 22 months ago by ptone

  • Component changed from Core (Other) to HTTP handling
  • Easy pickings unset
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.5 to master

I agree there is some room to improve the robustness of the handler here in the case of "something bad happened".

Signal receiver's shouldn't be doing anything fancy when getting the "got_request_exception" signal (basically heeding the warning in the docstring of "handle_uncaught_exception", but the handler itself should be robust against any poorly written signal handler.

Another related place the handler could be more robust is in "handle_uncaught_exception" itself. While hopefully rare, someone could have attempted to install some fancy 500 handler which itself raises some unforseen exceptoin - so maybe a try except is needed there as well, with some logging, and then returning a minimal 500 ala:

return http.HttpResponseServerError('<h1>Server Error (500)</h1>', content_type='text/html')

Also it is not clear to me why the more robust sending of the signal step shouldn't just be rolled into handle_uncaught_exception. Those two steps occur together every single time in the handler, so I'm not sure why they need to be separate steps - sending the signal could easily be considered as "part of" handling the uncaught exception.

comment:11 Changed 22 months ago by ptone

  • Cc preston@… added

comment:12 Changed 22 months ago by schacki

I have changed the code as follows:

  1. Refactored the got_request_exception handling into handle_uncaught_exception. This makes to code much cleaner and the process more transparent
  2. The inital exception, that triggered the handle_uncaught_exception is the main focus. I.e. this exception is logged first, this exception is raised if settings.DEBUG_PROPAGATE_EXCEPTIONS is True. This makes debugging and analyzing the root cause of all the trouble much easier (which was one of the reasons, why this ticket was raised), nevertheless all follow-up errors are still logged.
  3. Implemented proposal from ptone, to make the resolve500 more robust.
  4. Commented the code in handle_uncaught_exception to make the process and ideas behind it clearer.
Last edited 22 months ago by schacki (previous) (diff)

comment:13 Changed 22 months ago by schacki

  • Cc schacki added

comment:14 Changed 21 months ago by schacki

Hi all, may I ask on the status here. Anything I could/should? Thanks (was not aware that I was not logged in while writing my previous message)

comment:15 Changed 21 months ago by ptone

schacki - there aren't any steps to do you haven't already, thank you for the patch. I myself have not been able to make time to give it another review, I hope to be able to late this week, or early next. You can always try to recruit any member of the Django community to review the patch (not just core members), more eyes is always better. Thank you for your patience.

comment:16 Changed 17 months ago by anubhav9042

The PR looks good to me. I am not marking it as RFC though, because it would be better if core-dev does that.

comment:17 Changed 14 months ago by aaugustin

The PR doesn't apply any more.

Note: See TracTickets for help on using tickets.
Back to Top