Qinghao Hu (Tonyhao96)

Tonyhao96

Geek Repo

Company:Nanyang Technological University

Location:Singapore

Home Page:tonyhao.xyz

Github PK Tool:Github PK Tool


Organizations
S-Lab-System-Group

Qinghao Hu's starred repositories

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19249Issues:297Issues:1339

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:13145Issues:91Issues:627

triton

Development repository for the Triton language and compiler

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11068Issues:202Issues:2165
Language:PythonLicense:Apache-2.0Stargazers:7028Issues:66Issues:68

skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Language:PythonLicense:Apache-2.0Stargazers:6323Issues:71Issues:1651

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5772Issues:46Issues:75

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:2173Issues:24Issues:159

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2050Issues:34Issues:79

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonLicense:Apache-2.0Stargazers:1536Issues:28Issues:87

awesome_lists

Awesome Lists for Tenure-Track Assistant Professors and PhD students. (助理教授/博士生生存指南)

Language:PythonLicense:MITStargazers:1398Issues:33Issues:1

gdrive

Google Drive CLI Client

Language:RustLicense:MITStargazers:1353Issues:16Issues:110

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonLicense:Apache-2.0Stargazers:1283Issues:17Issues:48

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonLicense:Apache-2.0Stargazers:1000Issues:40Issues:66

skyplane

🔥 Blazing fast bulk data transfers between any cloud 🔥

Language:PythonLicense:Apache-2.0Stargazers:991Issues:24Issues:378

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:882Issues:19Issues:68

flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Language:PythonLicense:MITStargazers:781Issues:20Issues:29

EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Language:PythonLicense:Apache-2.0Stargazers:555Issues:9Issues:36

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonLicense:MITStargazers:482Issues:11Issues:60

ring-flash-attention

Ring attention implementation with flash attention

Autonomous-Agents

Autonomous Agents (LLMs) research papers. Updated Daily.

License:MITStargazers:289Issues:25Issues:0

doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

Language:HTMLLicense:MITStargazers:277Issues:5Issues:28
Language:PythonLicense:Apache-2.0Stargazers:245Issues:8Issues:71

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonLicense:MITStargazers:238Issues:11Issues:18

long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

superbenchmark

A validation and profiling tool for AI infrastructure

Language:PythonLicense:MITStargazers:215Issues:16Issues:67

LightSeq

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

torchsnapshot

A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.

Language:PythonLicense:NOASSERTIONStargazers:136Issues:21Issues:12
Language:Jupyter NotebookLicense:CC-BY-4.0Stargazers:116Issues:5Issues:1

orion

An interference-aware scheduler for fine-grained GPU sharing

Language:PythonLicense:MITStargazers:77Issues:2Issues:16