Shigangli

followers

following

stars

Beijing University of Posts and Telecommunications

Beijing, China

https://shigangli.github.io/

Shigang Li's repositories

Magicube

Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.

Language:C++GPL-3.079 4 2

Chimera

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.

Language:PythonGPL-3.039 2 3

Ok-Topk

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

Language:PythonGPL-3.023 2 4

eager-SGD

Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.

Language:PythonApache-2.08 30

COMPI

Cache-oblivious MPI all-to-all communications based on Morton order

Language:CGPL-3.03 20

DNN-cpp-proxies

C++/MPI proxies for distributed training of deep neural networks.

Language:C++GPL-3.02 30

akg

AKG (Auto Kernel Generator) is an optimizer for operators in Deep Learning Networks, which provides the ability to automatically fuse ops with specific patterns.

Language:PythonApache-2.0010

bigbird

Transformers for Longer Sequences

Language:PythonApache-2.0010

bigbird-1

Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers

Language:Jupyter NotebookMIT000

bolt

10x faster matrix and vector operations.

MPL-2.0000

brian2

Brian is a free, open source simulator for spiking neural networks.

Language:PythonNOASSERTION010

ColossalAI

Making large AI models cheaper, faster and more accessible

Language:PythonApache-2.0000

CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully ：）

Language:PythonMIT000

dace

DaCe - Data Centric Parallel Programming

Language:PythonBSD-3-Clause010

dlrm

An implementation of a deep learning recommendation model (DLRM)

Language:PythonMIT010

examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Language:PythonBSD-3-Clause010

flax

Flax is a neural network library for JAX that is designed for flexibility.

Language:PythonApache-2.0010

longformer

Longformer: The Long-Document Transformer

Language:PythonApache-2.0010

mindspore

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

Language:C++Apache-2.0010

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language:Jupyter NotebookMIT010

Nystromformer

Language:Python010

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language:C++Apache-2.0010

p4app-switchML

Switch ML Application

Language:C++Apache-2.0000

ROCm

ROCm - Open Source Platform for HPC and Ultrascale GPU Computing

010

shigangli.github.io

Homepage of Shigang Li https://shigangli.github.io/

Language:HTML020

sparsegpt

Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Language:PythonApache-2.0000

TensorRT_Tutorial

Language:C++010

vision_transformer

Language:Jupyter NotebookApache-2.0010

XiangShan

Open-source high-performance RISC-V processor

Language:ScalaNOASSERTION010

yapf

A formatter for Python files

Language:PythonApache-2.0010