Opened 5 years ago
Closed 5 years ago
#31509 closed New feature (fixed)
Enable faulthandler automatically during tests
Reported by: | Adam Johnson | Owned by: | Omkar Kulkarni |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description (last modified by )
Sometimes when testing one discovers a "hang," segfault, or other issue. This can be hard to diagnose since default test output just includes dots, so one can't tell which test has hung. Also hangs can be random, so hard to diagnose.
In such cases, faulthandler is useful to enable, as killing the process will output a stack trace for each thread.
Tests can be run using faulthandler with python -X faulthandler manage.py test
. However this needs both knowledge that it exists, and re-running the tests, when the hang might be non-deterministic.
Pytest automatically enables faulthandler and provides the option to disable it. I suggest we replicate the same in Django's DiscoverRunner
- call faulthandler.enable()
at the start of tests, unless disabled with a flag --no-faulthandler
.
Searching for past issues, it seems we've had three tickets where faulthandler was recommended for diagnosis: https://code.djangoproject.com/search?q=faulthandler . Additionally there are probably other cases where no progress was made on a bug because of lack of faulthandler
knowledge.
Change History (16)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|
comment:2 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 5 years ago
One concern for a PR to address is ensuring faulthandler is enabled in subprocesses ˙when using parallel testing. I don't actually believe the pytest feature does that, since its parallel processing feature is separate.
comment:4 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:5 by , 5 years ago
I was playing around with this locally, and the faulthandler properly outputs a traceback when running tests serially, but I ran into some issues when trying to test in parallel. I manually sent signals like SIGABRT or SIGSEGV to the multiprocessing PID, which caused the tests to loop forever (without outputting the faulthandler traceback). I believe this is because I am using python3.8 on macOS, which has changed the default start method for the multiprocessing module from 'fork' to 'spawn' (see https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). When I switched to python3.7, then the faulthandler will work for subprocesses as well (since child processes inherit signal handlers), but it still results in an infinite loop. I am pushing a commit that just enables fault handlers, but the issue of the infinite loop will persist since Python multiprocessing doesn't expose an API to see if the child is has been killed (i.e. handling SIGCHLD).
comment:7 by , 5 years ago
Killing the master process with SIGABRT isn't really a fair test. You need to emulate a crash *from within* the tests. Perhaps write a test that sends SIGABRT to os.getpid()
.
comment:8 by , 5 years ago
Sorry, I think I was unclear in my previous comment. I was sending SIGABRT to the child process that was forked, but I also tested it using os.kill(os.getpid(), SIGSEGV/SIGABRT) (for both parallel and serial tests). It works properly in parallel only if the child processes are forked. If multiprocessing.get_start_method() == 'spawn', then the faulthandler doesn't output the trace. According to the comments in runner.py, it seems like Django's parallel test runner relies on the processes being forked, but I can figure out a way to make it work for 'spawn' if that is a valid use case.
comment:9 by , 5 years ago
Just to clarify post my PR comments: yes the test runner relies on fork right now. There's a Google Summer of Code (GSoC) project being implemented by Ahmad Abdallah to move it to use spawn.
comment:10 by , 5 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
comment:11 by , 5 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
Triage Stage: | Accepted → Ready for checkin |
comment:12 by , 5 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
Please don't mark your own patches as ready for checkin.
comment:14 by , 5 years ago
Patch needs improvement: | set |
---|
comment:15 by , 5 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Practical example from the mailing list — good work there Adam!
I think we should investigate a PR here.
Thanks for the report.