李金梁's repositories
MIT_6.5940
MIT open course, efficient ML
CMU-10-714
CMU 10-714 Deep-Learning-Systems
Compass_Optimizer
Compass Optimizer (OPT for short), is part of the Zhouyi Compass Neural Network Compiler. The OPT is designed for converting the float Intermediate Representation (IR) generated by the Compass Unified Parser to an optimized quantized or mixed IR which is suited for Zhouyi NPU hardware platforms.
Compass_Unified_Parser
armchina NPU parser
Competitive_Programming
WPLF template
cs-self-learning
计算机自学指南
how-to-optimize-gemm
row-major matmul optimization
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
MIT-6.031-Software-Construction
The record of learning 6.031
OI-wiki
:star2: Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
onnx
Open standard for machine learning interoperability
Megatron-LM
Ongoing research training transformer models at scale
tinyflow
Tutorial code on how to build your own Deep Learning System in 2k Lines
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.