wpybtw

Pengyu Wang's starred repositories

llm.c

LLM training in simple, raw C/CUDA

Language:CudaMIT21933 217 120

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.020694 178 389

cosmopolitan

build-once run-anywhere c library

Language:CISC17340 165 526

llamafile

Distribute and run LLMs with a single file.

Language:C++NOASSERTION17059 155 361

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonApache-2.012203 77 759

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookNOASSERTION10493 85 295

gperftools

Main gperftools repository

Language:C++BSD-3-Clause8287 363 1304

lm-evaluation-harness

A framework for few-shot evaluation of language models.

Language:PythonMIT5865 36 941

transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.

Language:Jupyter Notebook2055 16 86

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.02031 34 78

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.01653 37 268

yarn

YaRN: Efficient Context Window Extension of Large Language Models

Language:PythonMIT1260 14 55

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:Python966 38 1

tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Language:PythonMIT683 15 58

cpufp

A CPU tool for benchmarking the peak of floating points

Language:AssemblyGPL-3.0452 16 12

nvbench

CUDA Kernel Benchmarking Library

Language:CudaApache-2.0450 18 89

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Language:PythonMIT410 9 11

H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Language:Python316 5 32

NBCE

Naive Bayes-based Context Extension

Language:Python308 6 7

DejaVu

Language:Python249 6 29

infini-transformer

PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)

Language:PythonMIT247 6 14