Parsl / parsl

Parsl - a Python parallel scripting library

Home Page:http://parsl-project.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

htex queue management thread exits prematurely if a ManagerLost and a BadStateException collide in a Future

benclifford opened this issue · comments

Describe the bug

During shutdown due to BadStateException, if a ManagerLost exception is subsequently raised, this exception is raised:

1700042886.365965 2023-11-15 02:08:06 MainProcess-103369 HTEX-Queue-Management-Thread-35184772379056 parsl.app.error
s:118 reraise DEBUG: Reraising exception of type <class 'parsl.executors.high_throughput.interchange.ManagerLost'>
1700042886.366492 2023-11-15 02:08:06 MainProcess-103369 HTEX-Queue-Management-Thread-35184772379056 parsl.process_l
oggers:31 wrapped ERROR: Exceptional ending for _queue_management_worker on thread HTEX-Queue-Management-Thread
Traceback (most recent call last):
  File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/executor.py", 
line 443, in _queue_management_worker
    s.reraise()
  File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/app/errors.py", line 122, in reraise
    reraise(t, v, v.__traceback__)
  File "/usr/tce/packages/python/python-3.8.2/lib/python3.8/site-packages/six.py", line 693, in reraise
    raise value
  File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/interchange.py", line 569, in expire_bad_managers
    raise ManagerLost(manager_id, m['hostname'])
parsl.executors.high_throughput.interchange.ManagerLost: Task failure due to loss of manager 093f904e283d on host lassen566

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/executor.py", line 445, in _queue_management_worker
    task_fut.set_exception(e)
  File "/usr/tce/packages/python/python-3.8.2/lib/python3.8/concurrent/futures/_base.py", line 539, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: FINISHED: <Future at 0x200017e49580 state=finished raised BadStateException>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/process_loggers.py", line 27, in wrapped
    r = func(*args, **kwargs)
  File "/g/g92/conti3/opt/venv/lassen-sys/lib/python3.8/site-packages/parsl/executors/high_throughput/executor.py", line 452, in _queue_management_worker
    task_fut.set_exception(
  File "/usr/tce/packages/python/python-3.8.2/lib/python3.8/concurrent/futures/_base.py", line 539, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: FINISHED: <Future at 0x200017e49580 state=finished raised BadStateException>

This leads to the queue management thread exceptionally ending. I'm unclear if this causes further problems.

To Reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Environment

  • Python 3.8
  • Parsl 2023.10.23

Distributed Environment
htex, LSFProvider

crossref #2473