Runtime CPU load balance strategy for local-task

Question

Runtime CPU load balance strategy for local-task

bhbruce opened this issue 2 months ago · comments

Background

I compiled my models to deploy on platforms with 2 CPUs. The vmfb ran with --task_topology_group_count=2.

Observation

Two workers equally share the dispatch tasks.
If one worker completes its tasks, it waits for another worker to complete its tasks.
In other words, one worker is spared until another one completes the tasks.

For instance, I have a matmul op that is divided into 48 dispatch tasks. Both workers are responsible for 24 tasks each. They do not help each other.

Based on the image, it is evident that worker-1 is idle, while worker-2 still has 5 remaining dispatch tasks.
It makes CPU usage drop to 55%.

Question

Does this observation align with our expectations?
Why doesn't worker-1 assist worker-2 in completing the remaining 5 tasks?

FYR. @rednoah91

Hong-Rong Hsu · Answer 1 · Mon May 27 2024 16:19:22 GMT+0800 (China Standard Time)

@benvanik Do you know if this case make sense?