YudiZh

Yudi Zhang's starred repositories

LLM101n

LLM101n: Let's build a Storyteller

25996 1774 28

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.023842 221 3653

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT19263 297 1340

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.012050 135 197

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookNOASSERTION10922 88 300

ml-engineering

Machine Learning Engineering Open Book

Language:PythonCC-BY-SA-4.010309 107 18

Yi

A series of large language models trained from scratch by developers @01-ai

Language:Jupyter NotebookApache-2.07520 111 289

OpenAgents

[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild

Language:PythonApache-2.03807 42 98

weak-to-strong

Language:PythonMIT2469 33 18

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.02058 34 79

MotionGPT

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

Language:PythonMIT1398 48 92

MirrorSite

镜像网站合集

1344 15 5

MAP-NEO

Language:Python789 10 32

the-art-of-debugging

The Art of Debugging

Language:CCC-BY-SA-4.0764 160

EAGLE

Official Implementation of EAGLE-1 and EAGLE-2

Language:PythonApache-2.0682 12 96

PiPPy

Pipeline Parallelism for PyTorch

Language:PythonBSD-3-Clause677 37 255

fairseq2

FAIR Sequence Modeling Toolkit 2

Language:PythonMIT638 18 97

DoLa

Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"

Language:Python383 3 15

ALMA

State-of-the-art LLM-based translation models.

Language:RubyMIT366 12 51

ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Language:PythonApache-2.0304 7 20

Transformer-M

[ICLR 2023] One Transformer Can Understand Both 2D & 3D Molecular Data (official implementation)

Language:PythonMIT197 6 21

Chinese-Tiny-LLM

Language:PythonNOASSERTION192 4 6

CapsFusion

[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale

Language:Python185 21 6

Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Language:PythonApache-2.0127 1 11

aligner

Achieving Efficient Alignment through Learned Correction

Language:Python97 1 6

DiJiang

[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear attention mechanism.

Language:Python90 5 6

mixinglaws

Language:Jupyter Notebook79 1 4

linear_open_lm

A repository for research on medium sized language models.

Language:PythonMIT6900

HGRN

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Language:Python60 2 2

csl

[Preprint] Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts

Language:PythonNOASSERTION14 20