Tonyhao96

followers

following

stars

Nanyang Technological University

Singapore

Organizations

S-Lab-System-Group

Qinghao Hu's starred repositories

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT19249 297 1339

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.013145 91 627

triton

Development repository for the Triton language and compiler

Language:C++MIT11235 179 1192

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.011068 202 2165

LWM

Language:PythonApache-2.07028 66 68

skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Language:PythonApache-2.06323 71 1651

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5772 46 75

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT2173 24 159

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookApache-2.02050 34 79

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.01536 28 87

awesome_lists

Awesome Lists for Tenure-Track Assistant Professors and PhD students. (助理教授/博士生生存指南)

Language:PythonMIT1398 33 1

gdrive

Google Drive CLI Client

Language:RustMIT1353 16 110

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonApache-2.01283 17 48

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonApache-2.01000 40 66

skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥

Language:PythonApache-2.0991 24 378

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.0882 19 68

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonMIT781 20 29

EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Language:PythonApache-2.0555 9 36

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonMIT482 11 60

ring-flash-attention

Ring attention implementation with flash attention

Language:Python451 9 27

Autonomous-Agents

Autonomous Agents (LLMs) research papers. Updated Daily.

MIT289 250

doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

Language:HTMLMIT277 5 28

InternEvo

Language:PythonApache-2.0245 8 71

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonMIT238 11 18

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Language:Python235 4 9

superbenchmark

A validation and profiling tool for AI infrastructure

Language:PythonMIT215 16 67

LightSeq

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Language:Python169 4 9

torchsnapshot

A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

Language:PythonNOASSERTION136 21 12

AcmeTrace

Language:Jupyter NotebookCC-BY-4.0116 5 1

orion

An interference-aware scheduler for fine-grained GPU sharing

Language:PythonMIT77 2 16