kgorking / ecs

A header-only/importable c++20 implementation of an entity-component-system (ecs), with focus on a simple interface and speed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement custom thread pool

kgorking opened this issue · comments

Using the parallel std works "fine I guess", but because it's so generic, a lot of the meta-information I have available is not used and potential performance is wasted. There is also absolutely no point in dynamically scheduling work when the layout and execution order is known in advance.

My custom thread pool would work with static scheduling, where each thread has a pre-determined list of jobs to complete, where the jobs are running systems on batches of components. The jobs for each thread will be arranged according to data accessed and system dependencies.

Consider 2 systems, each writing to components A and B respectively, on an 8-thread machine. The work for each system results in 5 jobs each.

1 2 3 4 5 6 7 8
A A A A A B B B
B B

Because the two systems are independent, they can be executed in parallel. If I add another system that reads from A and B and writes to C, a naive implementation would schedule the work as follows:

1 2 3 4 5 6 7 8
A A A A A B B B
B B C C C C C

C running on thread 3 now has to pull data from A and B into its cache that are already present on thread 1 and 6. If I instead use cache-aware ordering the work could be rearranged in the following manner to exploit cache temporal locality:

1 2 3 4 5 6 7 8
A A A A A
B B B B B
C C C C C

This now opens up another optimisation: collapsing jobs. Because the 3x5 jobs operate on the same range of components, the jobs can be merged and save 2 passes over the data:

1 2 3 4 5 6 7 8
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C

If I time each job on a thread I can get a threads total runtime for all its jobs, which I can then use to shift jobs between threads to ensure the minimum makespan, if one thread runs longer than the others.

Skip-lists for work in each thread should simplify the design