Can't some operations run in parallel?
ita9naiwa opened this issue · comments
Hyunsung Lee commented
Hi. I see that some operations like tensor addition are implemented with threads.
but I see that very similar operations like subtraction and division are run with single threads.
is it intended or it has some reasons behind the implementations of such operations?
John commented
The bottleneck for simple operation is memory bandwidth, more threads would likely reduce performance on them.
At least I assume that's the reason. An addition or subtraction is a very fast computation, getting a large tensor from memory and writing it again is a slow operation.