There are 0 repository under elementwise topic.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Standard library strided math functions.
Strided array math operations.
Base strided.
Apply a function to elements in two input arrays and assign the results to an output array.
Standard library special math functions.
Compute the absolute value.
Standard library strided array special math functions.
Apply a function to each element in an array and assign the result to an element in an output array, iterating from right to left.