WorkQueue missing task file and segfault in CI
benclifford opened this issue · comments
Describe the bug
I just saw this exception in CI. It's unfamiliar to me:
ERROR parsl.executors.status_handling:status_handling.py:142 Setting bad state due to exception
Exception: STDOUT: Found cores : 2
Launching worker: 1
work_queue_worker: creating workspace /tmp/worker-1001-5654
work_queue_worker: using 2 cores, 6931 MB memory, 19754 MB disk, 0 gpus
connected to manager fv-az220-227:9000 via local address 10.1.0.36:47576
STDERR: Network function: connection from ('127.0.0.1', 36318)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 36334)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 36340)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 36346)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 44766)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 44776)
Network function: recieved event: {'fn_
..
'direct'}
Network function: connection from ('127.0.0.1', 51136)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 51148)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function encountered exception [Errno 2] No such file or directory: 't.102'
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.10.12/x64/bin/parsl_coprocess.py", line 141, in <module>
main()
File "/opt/hostedtoolcache/Python/3.10.12/x64/bin/parsl_coprocess.py", line 69, in main
task_id = int(input_spec[1])
IndexError: list index out of range
/home/runner/work/parsl/parsl/runinfo/003/submit_scripts/parsl.WorkQueueExecutor.block-0.1691882251.5515625.sh: line 10: 5654 Segmentation fault (core dumped) PARSL_WORKER_BLOCK_ID=0 work_queue_worker --coprocess parsl_coprocess.py fv-az220-227 9000
DEBUG parsl.dataflow.dflow:dflow.py:304 Task 132 try 0 failed```
Environment
CI
Parsl 063033a
Python 3.10