What about having task-id values and binding outputs/completions of a task to another task and ability to create graphs of tasks?

Question

What about having task-id values and binding outputs/completions of a task to another task and ability to create graphs of tasks?

tugrul512bit opened this issue a year ago · comments

Hüseyin Tuğrul BÜYÜKIŞIK commented a year ago

I mean, you could do things like this:

task 1 <--- (task 2 + task 3)
 |
 |
 V 
task 4 + task 5 ----> task 6

Task 2, task 3 and task 5 start running in parallel while task 1 waiting for 2 & 3 to start. Then task 1 starts & completes and task 4 starts. Then task 6 starts.

This can make things like GPGPU overlapping pcie-copies for multiple GPUs while concurrently doing computations, etc easier. Also these kind of work scheduling may be repeatable. Like re-running whole thing without inserting anything new. This is especially useful for simulations with thousands of kernel repeats.

Can we also create new tasks within other tasks? Something like reduction:

start task 1
task 1 finds a particle and computes its force and finds 2 atoms within range
task 1 spawns 1 task per atom
task 2 and task 3 computes those atoms, this continues until all atoms are computed

doable?

Geru · Answer 1 · Fri Jul 07 2023 07:56:33 GMT+0800 (China Standard Time)

Hello!

Your suggestions are really intriguing and have certainly got me thinking. Thank you for bringing them up. I have added them to my TODO list as they seem to provide very valuable improvements to the library.

As for the task dependencies, I have a question. Are you envisioning a system where the user defines these dependencies upon adding tasks to the pool? The user would then be able to specify a list of dependencies for each task, and the task would only be executed once all its dependencies have completed.

I'm considering an approach where this is handled by using std::future and std::promise to signal when a task has completed, and using a condition variable to make dependent tasks wait for their dependencies. The condition variable would wake up the waiting tasks when their dependencies are ready.

Users would pass a Task object to the addTask method, containing all the necessary information such as the task function, callback function (if any), and a list of dependencies.

The idea of executing tasks within tasks is also very interesting. It's something that would need some more thought, but it could potentially add another level of flexibility to the task handling.

I'm still figuring out some of the specifics and it would be great to hear your thoughts on this approach. Does it align with your vision, or were you thinking about a system that would somehow automatically create these dependencies?

Thank you once again for your valuable input!

Hüseyin Tuğrul BÜYÜKIŞIK · Answer 2 · Fri Jul 07 2023 11:55:58 GMT+0800 (China Standard Time)

I meant user-defined dependencies. It is already doable with callbacka as you provide it but then a graph becomes a callback-hell.

Just randomizing names here:

lib.addTask(task, 3, {1,2}, T_COMPLETE);

so this task with id=3 would wait 1 and 2 id values to be completed.

Or, this:

lib.addTask(task, { isTask1Ready, isTask2Ready}) // these are lambda functions

Then, graph would be shaped differently with these:

T_COMPLETE - only after task have returned

T_STARTED - right after starting another task

T_CHECKPOINT - when a task has a progress info with x percentage reached

T_COROUTINE with futures, etc to be able to launch within same thread with another task. Maybe useful for using L1 cache to share things fast with another task?

I don't have a sharp vision. It is blurred currently :)

What about timed tasks for creating graphs with fail-safe tasks? If a task is not complete in 3 seconds, try with another task, before starting. Something like building a cluster computing app with some server nodes failing.

Conditional dependency, maybe like this:

addTask(..., computeIds); // lambda function that returns id vector with different id values depending on some conditions or maybe time too, dynamically. For example, if task1 progress greater then task2 progress, then pick task1 completion for starting. Maybe a task can be conditionally depend on itself too. This way user does not need to add it N times manually.

Yes, Task object would be just as good as a lambda which is already an object just implicitly.

Hüseyin Tuğrul BÜYÜKIŞIK · Answer 3 · Fri Jul 07 2023 13:37:47 GMT+0800 (China Standard Time)

Also those flags like T_COMPLETE could be combined with other flags:

S_ONCE, S_ALWAYS, S_SINGLE_ALWAYS etc to help some algorithms by having multiple copies of a task independently running or forcing a single copy to run at a time as means of implicit synchronizations beyond mutexes (and probably Amazon-Lambda like services?). So users can implement their cloud-sync functions between two tasks.

Also it would be fun to see a render of such a graph even in ASCII-art.

Hüseyin Tuğrul BÜYÜKIŞIK · Answer 4 · Fri Jul 07 2023 13:47:05 GMT+0800 (China Standard Time)

Sorry for too many comments, heres my last question:

If you are having dedicated threads for processing tasks, do you anchor them to specific cores of CPU? Because it would enable optimization techniques for graphs of tasks. For example, Intel has P-cores and E-cores while AMD has different frequency per core. Then you could measure task completion times per core and move tasks to cores that run them faster, between repeatations of a graph, so that a graph runs faster and faster on every repeat. This would require solving a constrained global optimization problem that minimizes total run-time by mapping N resources to P tasks.

Even if all cores are equal, an output of a task can be required by another task and it would be better if second task runs on same physical core due to data being in same cache. Also same if both tasks are sharing some data between each other, on SMT of same core concurrently.

Multi-GPU systems would require balancing work on GPUs on instead of CPUs. It would be just a bunch of integers to optimize globally per graph and a solver knows nothing about gpu at all. It just requires parameters to optimize and timing/energy/error to minimize. So it could help for users to be able to get a "resource id" that is interpreted by users as cpus or gpus or even internet connections in cloud.

Geru · Answer 5 · Sat Jul 08 2023 20:47:09 GMT+0800 (China Standard Time)

Thank you very much for your suggestions and ideas! They have sparked my interest and I will further explore these concepts. While some features may extend beyond the initial scope of the library, I would love to investigate the possibility of incorporating certain functionalities. I'm working on this project in my spare time, and I believe handling dependencies could be a feasible addition in the near future.