Pengyu Wang's starred repositories
cosmopolitan
build-once run-anywhere c library
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
gperftools
Main gperftools repository
lm-evaluation-harness
A framework for few-shot evaluation of language models.
transformers_tasks
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
LM-Infinite
Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture