yuki-koyama / parallel-util

Simple header-only implementation of "parallel_for" and "parallel_map" for C++11

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Added a new task queue-based parallel_for -- which should be the default?

yuki-koyama opened this issue · comments

Recently I added a new parallel_for function:

template<typename Callable>
void queue_based_parallel_for(int n, Callable function, int target_concurrency = 0);

This function uses a task queue and each thread takes a next task from the queue every time a task finishes.

Compared to the original parallel_for, this function is likely to achieve better CPU occupancy especially when the cost of each local process is computationally heterogenous (i.e., some processes are light and others are heavy). However, this function could be slower than the original parallel_for in some cases because of

  1. cache inefficiency (each thread works on less local processes) and
  2. mutex lock for the task queue.

The question is, which approach should be the default parallel_for?