Can't some operations run in parallel?

Question

Can't some operations run in parallel?

ita9naiwa opened this issue 8 months ago · comments

Hyunsung Lee commented 8 months ago

Hi. I see that some operations like tensor addition are implemented with threads.

but I see that very similar operations like subtraction and division are run with single threads.

is it intended or it has some reasons behind the implementations of such operations?

John · Answer 1 · Thu Oct 05 2023 03:23:01 GMT+0800 (China Standard Time)

The bottleneck for simple operation is memory bandwidth, more threads would likely reduce performance on them.
At least I assume that's the reason. An addition or subtraction is a very fast computation, getting a large tensor from memory and writing it again is a slow operation.