Zehaos

followers

following

stars

Zehao Shi's starred repositories

ColossalAI

Making large AI models cheaper, faster and more accessible

Language:PythonApache-2.038294 382 1600

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.033651 340 2633

mojo

The Mojo Programming Language

Language:MojoNOASSERTION22059 267 1887

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause11820 104 860

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonApache-2.010867 160 192

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookMIT9681 84 246

DeepSpeedExamples

Example models using DeepSpeed

Language:PythonApache-2.05829 76 526

UniAD

[CVPR 2023 Best Paper] Planning-oriented Autonomous Driving

Language:PythonApache-2.03065 34 169

alpa

Training and serving large-scale neural networks with auto parallelization.

Language:PythonApache-2.03004 45 296

optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Language:PythonApache-2.02302 59 695

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

Language:PythonMIT1959 15 23

kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Language:Jupyter NotebookApache-2.01484 27 174

SparK

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Language:PythonMIT1399 26 80

how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Language:Cuda1148 16 6

python3-source-code-analysis

《Python 3 源码剖析》

Language:MakefileNOASSERTION944 64 10

eRPC

Efficient RPCs for datacenter networks

Language:C++NOASSERTION835 34 100

PatrickStar

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.

Language:PythonBSD-3-Clause744 16 56

BMTrain

Efficient Training (including pre-training and fine-tuning) for Big Models

Language:PythonApache-2.0533 11 85

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonMIT412 8 2

dino-vit-features

Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".

Language:PythonMIT349 4 21

algorithmic-efficiency

MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.

Language:PythonApache-2.0312 24 208

Pytorch-PCGrad

Pytorch reimplementation for "Gradient Surgery for Multi-Task Learning"

Language:PythonBSD-3-Clause281 5 17

grain

Language:PythonApache-2.0210 10 9

infinity

A lightweight C++ RDMA library for InfiniBand networks.

Language:C++MIT171 6 8

sagemaker-debugger

Amazon SageMaker Debugger provides functionality to save tensors during training of machine learning jobs and analyze those tensors

Language:PythonApache-2.0158 25 92

mtm

MTM Masked Trajectory Models for Prediction, Representation, and Control.

Language:PythonMIT143 12 2

slapo

A schedule language for large model training

Language:PythonApache-2.0135 12 17

SHARK-Turbine

Unified compiler/runtime for interfacing with PyTorch Dynamo.

Language:PythonApache-2.080 29 484

rotograd

Official Pytorch's implementation of RotoGrad

Language:Python72 3 5

rdmapp

C++ interfaces for RDMA access

Language:C++Apache-2.044 4 1