Weigao Sun's starred repositories
google-research
Google Research
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
pytorch_geometric
Graph Neural Network Library for PyTorch
flash-attention
Fast and memory-efficient exact attention
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
torchmetrics
Torchmetrics - Machine learning metrics for distributed, scalable PyTorch applications.
GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
NeMo-Megatron-Launcher
NeMo Megatron launcher and tools
zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models