There are 9 repositories under model-parallelism topic.
Making large AI models cheaper, faster and more accessible
A GPipe implementation in PyTorch
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Slicing a PyTorch Tensor Into Parallel Shards
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
A curated list of awesome projects and papers for distributed training or inference
Distributed training of DNNs • C++/MPI Proxies (GPT-2, GPT-3, CosmoFlow, DLRM)
Distributed training (multi-node) of a Transformer model
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
A decentralized and distributed framework for training DNNs
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for Faster Config and Convergence of Distributed ML Pipeline Parallelism
Description of Framework for Efficient Fused-layer Cost Estimation, Legion (2021)
The project is focused on parallelising pre-processing, measuring and machine learning in the cloud, as well as the evaluation and analysis of the cloud performance.
distributed tensorflow (model parallelism) example repository
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
A fully distributed hyperparameter optimization tool for PyTorch DNNs
performance test of MNIST hand writings usign MXNet + TF