Parsl / parsl

Parsl - a Python parallel scripting library

Home Page:http://parsl-project.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WorkQueue missing task file and segfault in CI

benclifford opened this issue · comments

Describe the bug

I just saw this exception in CI. It's unfamiliar to me:

ERROR    parsl.executors.status_handling:status_handling.py:142 Setting bad state due to exception
Exception: 	STDOUT: Found cores : 2
Launching worker: 1
work_queue_worker: creating workspace /tmp/worker-1001-5654
work_queue_worker: using 2 cores, 6931 MB memory, 19754 MB disk, 0 gpus
connected to manager fv-az220-227:9000 via local address 10.1.0.36:47576
	STDERR: Network function: connection from ('127.0.0.1', 36318)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 36334)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 36340)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 36346)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 44766)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 44776)
Network function: recieved event: {'fn_
..
 'direct'}
Network function: connection from ('127.0.0.1', 51136)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function: connection from ('127.0.0.1', 51148)
Network function: recieved event: {'fn_kwargs': {}, 'fn_args': ['map', 'function', 'result'], 'remote_task_exec_method': 'direct'}
Network function encountered exception  [Errno 2] No such file or directory: 't.102'
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.12/x64/bin/parsl_coprocess.py", line 141, in <module>
    main()
  File "/opt/hostedtoolcache/Python/3.10.12/x64/bin/parsl_coprocess.py", line 69, in main
    task_id = int(input_spec[1])
IndexError: list index out of range
/home/runner/work/parsl/parsl/runinfo/003/submit_scripts/parsl.WorkQueueExecutor.block-0.1691882251.5515625.sh: line 10:  5654 Segmentation fault      (core dumped) PARSL_WORKER_BLOCK_ID=0 work_queue_worker --coprocess parsl_coprocess.py fv-az220-227 9000


DEBUG    parsl.dataflow.dflow:dflow.py:304 Task 132 try 0 failed```

Environment
CI
Parsl 063033a
Python 3.10