Opened 4 years ago

Closed 4 years ago

#31509 closed New feature (fixed)

Enable faulthandler automatically during tests

Reported by: Adam Johnson Owned by: Omkar Kulkarni
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description (last modified by Adam Johnson)

Sometimes when testing one discovers a "hang," segfault, or other issue. This can be hard to diagnose since default test output just includes dots, so one can't tell which test has hung. Also hangs can be random, so hard to diagnose.

In such cases, faulthandler is useful to enable, as killing the process will output a stack trace for each thread.

Tests can be run using faulthandler with python -X faulthandler manage.py test. However this needs both knowledge that it exists, and re-running the tests, when the hang might be non-deterministic.

Pytest automatically enables faulthandler and provides the option to disable it. I suggest we replicate the same in Django's DiscoverRunner - call faulthandler.enable() at the start of tests, unless disabled with a flag --no-faulthandler.

Searching for past issues, it seems we've had three tickets where faulthandler was recommended for diagnosis: https://code.djangoproject.com/search?q=faulthandler . Additionally there are probably other cases where no progress was made on a bug because of lack of faulthandler knowledge.

Change History (16)

comment:1 by Adam Johnson, 4 years ago

Description: modified (diff)

comment:2 by Carlton Gibson, 4 years ago

Triage Stage: UnreviewedAccepted

Practical example from the mailing list — good work there Adam!

I think we should investigate a PR here.
Thanks for the report.

comment:3 by Adam Johnson, 4 years ago

One concern for a PR to address is ensuring faulthandler is enabled in subprocesses ˙when using parallel testing. I don't actually believe the pytest feature does that, since its parallel processing feature is separate.

comment:4 by Omkar Kulkarni, 4 years ago

Owner: changed from nobody to Omkar Kulkarni
Status: newassigned

comment:5 by Omkar Kulkarni, 4 years ago

I was playing around with this locally, and the faulthandler properly outputs a traceback when running tests serially, but I ran into some issues when trying to test in parallel. I manually sent signals like SIGABRT or SIGSEGV to the multiprocessing PID, which caused the tests to loop forever (without outputting the faulthandler traceback). I believe this is because I am using python3.8 on macOS, which has changed the default start method for the multiprocessing module from 'fork' to 'spawn' (see https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). When I switched to python3.7, then the faulthandler will work for subprocesses as well (since child processes inherit signal handlers), but it still results in an infinite loop. I am pushing a commit that just enables fault handlers, but the issue of the infinite loop will persist since Python multiprocessing doesn't expose an API to see if the child is has been killed (i.e. handling SIGCHLD).

comment:6 by Omkar Kulkarni, 4 years ago

Has patch: set
Last edited 4 years ago by Mariusz Felisiak (previous) (diff)

comment:7 by Adam Johnson, 4 years ago

Killing the master process with SIGABRT isn't really a fair test. You need to emulate a crash *from within* the tests. Perhaps write a test that sends SIGABRT to os.getpid().

comment:8 by Omkar Kulkarni, 4 years ago

Sorry, I think I was unclear in my previous comment. I was sending SIGABRT to the child process that was forked, but I also tested it using os.kill(os.getpid(), SIGSEGV/SIGABRT) (for both parallel and serial tests). It works properly in parallel only if the child processes are forked. If multiprocessing.get_start_method() == 'spawn', then the faulthandler doesn't output the trace. According to the comments in runner.py, it seems like Django's parallel test runner relies on the processes being forked, but I can figure out a way to make it work for 'spawn' if that is a valid use case.

comment:9 by Adam Johnson, 4 years ago

Just to clarify post my PR comments: yes the test runner relies on fork right now. There's a Google Summer of Code (GSoC) project being implemented by Ahmad Abdallah to move it to use spawn.

comment:10 by Mariusz Felisiak, 4 years ago

Needs documentation: set
Needs tests: set
Patch needs improvement: set

comment:11 by Omkar Kulkarni, 4 years ago

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:12 by Mariusz Felisiak, 4 years ago

Triage Stage: Ready for checkinAccepted

Please don't mark your own patches as ready for checkin.

comment:13 by Omkar Kulkarni, 4 years ago

Oh ok, sorry about that.

comment:14 by Mariusz Felisiak, 4 years ago

Patch needs improvement: set

comment:15 by Mariusz Felisiak, 4 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:16 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In b7a438c:

Fixed #31509 -- Made DiscoverRunner enable faulthandler by default.

Note: See TracTickets for help on using tickets.
Back to Top