umiswing's repositories
NiuTrans.NMT
A Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.
emacs-abyss-theme
A dark theme for Emacs
emacs-catppuccin
🍄 Soothing pastel theme for Emacs
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
flash-attention
Fast and memory-efficient exact attention
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
maxas
Assembler for NVIDIA Maxwell architecture
YHs_Sample
Yinghan's Code Sample