Pengyu Wang's starred repositories

llm.c

LLM training in simple, raw C/CUDA

Language:CudaLicense:MITStargazers:21933Issues:217Issues:120

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:20694Issues:178Issues:389

cosmopolitan

build-once run-anywhere c library

llamafile

Distribute and run LLMs with a single file.

Language:C++License:NOASSERTIONStargazers:17059Issues:155Issues:361

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:12203Issues:77Issues:759

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10493Issues:85Issues:295

gperftools

Main gperftools repository

Language:C++License:BSD-3-ClauseStargazers:8287Issues:363Issues:1304

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonLicense:MITStargazers:5865Issues:36Issues:941

transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.

Language:Jupyter NotebookStargazers:2055Issues:16Issues:86

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2031Issues:34Issues:78

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1653Issues:37Issues:268

yarn

YaRN: Efficient Context Window Extension of Large Language Models

Language:PythonLicense:MITStargazers:1260Issues:14Issues:55

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Language:PythonLicense:MITStargazers:683Issues:15Issues:58

cpufp

A CPU tool for benchmarking the peak of floating points

Language:AssemblyLicense:GPL-3.0Stargazers:452Issues:16Issues:12

nvbench

CUDA Kernel Benchmarking Library

Language:CudaLicense:Apache-2.0Stargazers:450Issues:18Issues:89

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Language:PythonLicense:MITStargazers:410Issues:9Issues:11

H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

NBCE

Naive Bayes-based Context Extension

infini-transformer

PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)

Language:PythonLicense:MITStargazers:247Issues:6Issues:14
Language:C++License:NOASSERTIONStargazers:195Issues:11Issues:15

turingas

Assembler for NVIDIA Volta and Turing GPUs

Language:PythonLicense:MITStargazers:188Issues:11Issues:10

LM-Infinite

Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"

Language:PythonLicense:MITStargazers:97Issues:4Issues:10
Language:PythonLicense:Apache-2.0Stargazers:71Issues:3Issues:5

pyaskit

AskIt: Unified programming interface for programming with LLMs (GPT-3.5, GPT-4, Gemini, Claude, Cohere, Llama 2)

Language:PythonLicense:MITStargazers:70Issues:2Issues:2

gpu-arch-microbenchmark

Dissecting NVIDIA GPU Architecture

Language:C++Stargazers:36Issues:3Issues:0
Language:PythonLicense:BSD-3-ClauseStargazers:6Issues:3Issues:1