Runtime CPU load balance strategy for local-task
bhbruce opened this issue · comments
Background
I compiled my models to deploy on platforms with 2 CPUs. The vmfb ran with --task_topology_group_count=2
.
Observation
Two workers equally share the dispatch tasks.
If one worker completes its tasks, it waits for another worker to complete its tasks.
In other words, one worker is spared until another one completes the tasks.
For instance, I have a matmul op that is divided into 48 dispatch tasks. Both workers are responsible for 24 tasks each. They do not help each other.
Based on the image, it is evident that worker-1 is idle, while worker-2 still has 5 remaining dispatch tasks.
It makes CPU usage drop to 55%.
Question
Does this observation align with our expectations?
Why doesn't worker-1 assist worker-2 in completing the remaining 5 tasks?
FYR. @rednoah91
@benvanik Do you know if this case make sense?