yuguo's starred repositories
llm-inference-benchmark
LLM Inference benchmark
HIP-Performance-Optmization-on-VEGA64
14 basic topics for VEGA64 performance optmization
dlsys_solution
Homework solutions for CMU 10-414/714 – Deep Learning Systems: Algorithms and Implementation
amd-lab-notes
AMD lab notes with code examples to demonstrate use of AMD GPUs
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
ColossalAI
Making large AI models cheaper, faster and more accessible
gearshifft
Benchmark Suite for Heterogenuous FFT Implementations
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
DeepLearningC
Simple program to learn CNN (LeNet-5) in pure C