There are 0 repository under rmsnorm topic.
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.
Simple character level Transformer