Sheng Qin's starred repositories
latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
awesome-rdma
A curated list of awesome rdma resources
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
mini-rv32ima
A tiny C header-only risc-v emulator.
WebGL-Fluid-Simulation
Play with fluids in your browser (works even on mobile)