Enigmatisms / culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

CUDA

  • efficient tiling based large matrix multiplication
  • warp reduce based Matrix Vector multiplication
  • warp reduce based vector dot product
  • warp reduce
  • Flash attention module 1: fused QKV attention (tiling based)
    • softmax(Q^T K / scale) can be easily fused
    • the extra V... well, it's a pain in the ass, TBH
  • coalsescing memory access benchmarking

CPU

  • thread pool (condition variable and simple multi-threading)
  • double buffer (std::timed_mutex and simple multi-threading) with simple benchmarking
  • cache update algorithms:
    • LRU (least recently used)
    • LFU (least frequently used)

About

CUDA accelerated linear algebra / CPU acceleration algorithms.

License:MIT License


Languages

Language:Cuda 81.4%Language:C++ 15.7%Language:CMake 2.8%