OhadRubin / swarm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

This repository contains the code to replicate experiments of the anonymous paper "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient".

Bottleneck experiments

Instructions to replicate the compression-aware architecture experiments can be found in bottleneck/README.md.

Large-scale experiments and throughput estimation

Instructions to replicate the experiments on large-scale language model pretraining and throughput estimation on multiple preemptible nodes, as well as the prototype implementation of mechanisms behind SWARM, are located in the swarm subfolder.

About


Languages

Language:Python 94.9%Language:Shell 3.5%Language:Cuda 0.8%Language:C++ 0.5%Language:Cython 0.3%Language:Lua 0.1%Language:Batchfile 0.0%Language:Makefile 0.0%