Xiaoyu Zhang's repositories
tvm_mlir_learn
compiler learning resources collect.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
how-to-learn-deep-learning-framework
how to learn PyTorch and OneFlow
giantpandacv.com
www.giantpandacv.com
mlc-llm-code-analysis
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 50+ datasets.
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
FasterTransformer
Transformer related optimization, including BERT, GPT
LLaMA-Factory
Easy-to-use LLM fine-tuning framework (LLaMA, BLOOM, Mistral, Baichuan, Qwen, ChatGLM)
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
tvm_gpu_gemm
play gemm with tvm