htex queue management thread exits prematurely if a ManagerLost and a BadStateException collide in a Future
benclifford opened this issue · comments
Ben Clifford commented
Describe the bug
During shutdown due to BadStateException, if a ManagerLost exception is subsequently raised, this exception is raised:
1700042886.365965 2023-11-15 02:08:06 MainProcess-103369 HTEX-Queue-Management-Thread-35184772379056 parsl.app.error
s:118 reraise DEBUG: Reraising exception of type <class 'parsl.executors.high_throughput.interchange.ManagerLost'>
1700042886.366492 2023-11-15 02:08:06 MainProcess-103369 HTEX-Queue-Management-Thread-35184772379056 parsl.process_l
oggers:31 wrapped ERROR: Exceptional ending for _queue_management_worker on thread HTEX-Queue-Management-Thread
Traceback (most recent call last):
File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/executor.py",
line 443, in _queue_management_worker
s.reraise()
File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/app/errors.py", line 122, in reraise
reraise(t, v, v.__traceback__)
File "/usr/tce/packages/python/python-3.8.2/lib/python3.8/site-packages/six.py", line 693, in reraise
raise value
File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/interchange.py", line 569, in expire_bad_managers
raise ManagerLost(manager_id, m['hostname'])
parsl.executors.high_throughput.interchange.ManagerLost: Task failure due to loss of manager 093f904e283d on host lassen566
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/executor.py", line 445, in _queue_management_worker
task_fut.set_exception(e)
File "/usr/tce/packages/python/python-3.8.2/lib/python3.8/concurrent/futures/_base.py", line 539, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: FINISHED: <Future at 0x200017e49580 state=finished raised BadStateException>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/process_loggers.py", line 27, in wrapped
r = func(*args, **kwargs)
File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/executor.py", line 452, in _queue_management_worker
task_fut.set_exception(
File "/usr/tce/packages/python/python-3.8.2/lib/python3.8/concurrent/futures/_base.py", line 539, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: FINISHED: <Future at 0x200017e49580 state=finished raised BadStateException>
This leads to the queue management thread exceptionally ending. I'm unclear if this causes further problems.
To Reproduce
Expected behavior
A clear and concise description of what you expected to happen.
Environment
- Python 3.8
- Parsl 2023.10.23
Distributed Environment
htex, LSFProvider
Ben Clifford commented
crossref #2473