KeyboardInterrupt raises an exception which results in a zero exit code
amarckal opened this issue · comments
Bug description
During training whenever there is a keyboard interrupt the fit loop raises a SIGTERMException
pytorch-lightning/src/lightning/pytorch/loops/fit_loop.py
Lines 397 to 398 in 98005bb
which results in a 0 exit code. Other scripts relying on the exit code of the training script pick this up as if the training script has exited normally.
The issue comes from here:
pytorch-lightning/src/lightning/pytorch/utilities/exceptions.py
Lines 19 to 28 in 98005bb
raising a SystemExit
in python without specifying the exit code, has the code set to None
which gets converted to 0
. The fix would be to have:
class SIGTERMException(SystemExit):
"""Exception used when a :class:`signal.SIGTERM` is sent to a process.
This exception is raised by the loops at specific points. It can be used to write custom logic in the
:meth:`lightning.pytorch.callbacks.callback.Callback.on_exception` method.
For example, you could use the :class:`lightning.pytorch.callbacks.fault_tolerance.OnExceptionCheckpoint` callback
that saves a checkpoint for you when this exception is raised.
"""
code = 128 + 15 # see https://tldp.org/LDP/abs/html/exitcodes.html
What version are you seeing the problem on?
v2.0, v2.1, v2.2, master
How to reproduce the bug
Start a training and then send a keyboard interrupt signal to it, and run echo $?
to see the exit code.