Zehao Shi's starred repositories

ColossalAI

Making large AI models cheaper, faster and more accessible

Language:PythonLicense:Apache-2.0Stargazers:38294Issues:382Issues:1600

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:33651Issues:340Issues:2633

mojo

The Mojo Programming Language

Language:MojoLicense:NOASSERTIONStargazers:22059Issues:267Issues:1887

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11820Issues:104Issues:860

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:Apache-2.0Stargazers:10867Issues:160Issues:192

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookLicense:MITStargazers:9681Issues:84Issues:246

DeepSpeedExamples

Example models using DeepSpeed

Language:PythonLicense:Apache-2.0Stargazers:5829Issues:76Issues:526

UniAD

[CVPR 2023 Best Paper] Planning-oriented Autonomous Driving

Language:PythonLicense:Apache-2.0Stargazers:3065Issues:34Issues:169

alpa

Training and serving large-scale neural networks with auto parallelization.

Language:PythonLicense:Apache-2.0Stargazers:3004Issues:45Issues:296

optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Language:PythonLicense:Apache-2.0Stargazers:2302Issues:59Issues:695

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

Language:PythonLicense:MITStargazers:1959Issues:15Issues:23

kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1484Issues:27Issues:174

SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Language:PythonLicense:MITStargazers:1399Issues:26Issues:80

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

python3-source-code-analysis

《Python 3 源码剖析》

Language:MakefileLicense:NOASSERTIONStargazers:944Issues:64Issues:10

eRPC

Efficient RPCs for datacenter networks

Language:C++License:NOASSERTIONStargazers:835Issues:34Issues:100

PatrickStar

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

Language:PythonLicense:BSD-3-ClauseStargazers:744Issues:16Issues:56

BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Language:PythonLicense:Apache-2.0Stargazers:533Issues:11Issues:85

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonLicense:MITStargazers:412Issues:8Issues:2

dino-vit-features

Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".

Language:PythonLicense:MITStargazers:349Issues:4Issues:21

algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.

Language:PythonLicense:Apache-2.0Stargazers:312Issues:24Issues:208

Pytorch-PCGrad

Pytorch reimplementation for "Gradient Surgery for Multi-Task Learning"

Language:PythonLicense:BSD-3-ClauseStargazers:281Issues:5Issues:17
Language:PythonLicense:Apache-2.0Stargazers:210Issues:10Issues:9

infinity

A lightweight C++ RDMA library for InfiniBand networks.

Language:C++License:MITStargazers:171Issues:6Issues:8

sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors

Language:PythonLicense:Apache-2.0Stargazers:158Issues:25Issues:92

mtm

MTM Masked Trajectory Models for Prediction, Representation, and Control.

Language:PythonLicense:MITStargazers:143Issues:12Issues:2

slapo

A schedule language for large model training

Language:PythonLicense:Apache-2.0Stargazers:135Issues:12Issues:17

SHARK-Turbine

Unified compiler/runtime for interfacing with PyTorch Dynamo.

Language:PythonLicense:Apache-2.0Stargazers:80Issues:29Issues:484

rotograd

Official Pytorch's implementation of RotoGrad

rdmapp

C++ interfaces for RDMA access

Language:C++License:Apache-2.0Stargazers:44Issues:4Issues:1