ggerganov / ggml

Tensor library for machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't some operations run in parallel?

ita9naiwa opened this issue · comments

Hi. I see that some operations like tensor addition are implemented with threads.

but I see that very similar operations like subtraction and division are run with single threads.

is it intended or it has some reasons behind the implementations of such operations?

commented

The bottleneck for simple operation is memory bandwidth, more threads would likely reduce performance on them.
At least I assume that's the reason. An addition or subtraction is a very fast computation, getting a large tensor from memory and writing it again is a slow operation.