handle missing controller gracefully
bertsky opened this issue · comments
ATM if the admin forgot to activate the controller service or point to an external instance with CONTROLLER_HOST
/ CONTROLLER_PORT_SSH
, for both entry points (for_production/presentation.sh) we have a very unfortunate behaviour:
- Manager logs show the workflow is started, nothing more (no error that Controller cannot be reached)
- Monitor shows the job as terminated (because no connection can be established to Controller at all)
- script does not terminate
- Manager logs show the workflow is started, nothing more (no error that Controller cannot be reached)
That's actually not true: the Manager logs will "unwind" from the correct error message
ocrd-manager process_images.sh: ssh: Could not resolve hostname ocrd-controller: Temporary failure in name resolution#015
That log is directly accessible as ocrd.log
in the inactive job list.
- script does not terminate
Also not true.
@SvenMarcus is there something I am missing? (I dimly recall you found the original problem...)