yuki-koyama / parallel-util

Simple header-only implementation of "parallel_for" and "parallel_map" for C++11

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unbalance inner loops

yuki-koyama opened this issue · comments

About the parallel_for function, the current algorithm to assign inner loops to each thread is not well designed and can produce unbalanced assignments. For example, suppose the following case:

  • 1050 loops
  • 100 threads

The thread no.1 to no.99 will be assigned 10 inner loops, but the last thread no.100 will be assigned 60 inner loops. Obviously, the last thread can be the bottleneck.

This should be fixed somehow.