Enigmatisms / culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

1100

acceleration cpu cuda gpu multithreading

culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

CUDA

efficient tiling based large matrix multiplication
warp reduce based Matrix Vector multiplication
warp reduce based vector dot product
warp reduce
Flash attention module 1: fused QKV attention (tiling based)
- softmax(Q^T K / scale) can be easily fused
- the extra V... well, it's a pain in the ass, TBH
coalsescing memory access benchmarking

CPU

thread pool (condition variable and simple multi-threading)
double buffer (std::timed_mutex and simple multi-threading) with simple benchmarking
cache update algorithms:
- LRU (least recently used)
- LFU (least frequently used)

About

CUDA accelerated linear algebra / CPU acceleration algorithms.

acceleration cpu cuda gpu multithreading

MIT License

Languages

Language:Cuda 81.4%Language:C++ 15.7%Language:CMake 2.8%

Links

ProductDiscover

Data Powerby api.github.com. Remove your profile on the Giters? Go to settings.

Contact Site Admin: Giters.