Load model failed: drawn_humanoid_detector, error: Worker died.

Question

Load model failed: drawn_humanoid_detector, error: Worker died.

Tigran01 opened this issue 2 months ago · comments

I get the following error when running python image_to_animation.py drawings/garlic.png garlic_out after successfully running the docker and getting Healthy status

Traceback (most recent call last):
  File "image_to_animation.py", line 41, in <module>
    image_to_animation(img_fn, char_anno_dir, motion_cfg_fn, retarget_cfg_fn)
  File "image_to_animation.py", line 19, in image_to_animation
    image_to_annotations(img_fn, char_anno_dir)
  File "/Users/tigran/AnimatedDrawings/examples/image_to_annotations.py", line 53, in image_to_annotations
    raise Exception(f"Failed to get bounding box, please check if the 'docker_torchserve' is running and healthy, resp: {resp}")
Exception: Failed to get bounding box, please check if the 'docker_torchserve' is running and healthy, resp: <Response [507]>

Here are some Docker logs:

2024-04-02 05:39:46 2024-04-02T01:39:46,630 [INFO ] W-9007-drawn_humanoid_detector_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-drawn_humanoid_detector_1.0-stdout
2024-04-02 05:39:46 2024-04-02T01:39:46,630 [INFO ] W-9007-drawn_humanoid_detector_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-drawn_humanoid_detector_1.0-stderr
2024-04-02 05:39:46 2024-04-02T01:39:46,632 [INFO ] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_STARTED
2024-04-02 05:39:46 2024-04-02T01:39:46,637 [DEBUG] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-04-02 05:39:46 2024-04-02T01:39:46,638 [DEBUG] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., responseTimeout:120sec
2024-04-02 05:39:46 java.lang.InterruptedException: null
2024-04-02 05:39:46     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?]
2024-04-02 05:39:46     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?]
2024-04-02 05:39:46     at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?]
2024-04-02 05:39:46     at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:229) [model-server.jar:?]
2024-04-02 05:39:46     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
2024-04-02 05:39:46     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
2024-04-02 05:39:46     at java.lang.Thread.run(Thread.java:829) [?:?]
2024-04-02 05:39:46 2024-04-02T01:39:46,643 [WARN ] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: drawn_humanoid_detector, error: Worker died.
2024-04-02 05:39:46 2024-04-02T01:39:46,643 [DEBUG] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerThread - W-9007-drawn_humanoid_detector_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-04-02 05:39:46 2024-04-02T01:39:46,643 [WARN ] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-04-02 05:39:46 2024-04-02T01:39:46,646 [WARN ] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9007-drawn_humanoid_detector_1.0-stderr
2024-04-02 05:39:46 2024-04-02T01:39:46,646 [WARN ] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9007-drawn_humanoid_detector_1.0-stdout
2024-04-02 05:39:46 2024-04-02T01:39:46,648 [INFO ] W-9007-drawn_humanoid_detector_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9007 in 144 seconds.

hjessmith · Answer 1 · Wed Apr 03 2024 03:37:08 GMT+0800 (China Standard Time)

Try upping the RAM available to Docker.

There are also a number of closed issues that describe troubleshooting similar issues. A quick issue search might surface some helpful information for you.

Tigran · Answer 2 · Wed Apr 03 2024 06:24:52 GMT+0800 (China Standard Time)

Try upping the RAM available to Docker.

There are also a number of closed issues that describe troubleshooting similar issues. A quick issue search might surface some helpful information for you.

That worked! Thanks!