Habvt's repositories
lbann
Livermore Big Artificial Neural Network Toolkit
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
DeepSpeedExamples
Example models using DeepSpeed
ComScribe
ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.
alpa
Training and serving large-scale neural networks
HeliosData
Helios Traces from SenseTime
EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
faas-profiler
A tool for testing and profiling FaaS platforms
EPL_examples
FastNN provides distributed training examples that use EPL.
clusterdata
cluster data collected from production clusters in Alibaba for cluster management research
gavel
Code for "Heterogenity-Aware Cluster Scheduling Policies for Deep Learning Workloads", which appeared at OSDI 2020
INFless
The source code of INFless,a native serverless platform for AI inference.
volcano
A Kubernetes Native Batch System (Project under CNCF)
INFaaS
Model-less Inference Serving
client
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
GPipe-Core
Core library of new GPipe, encapsulating OpenGl and providing a type safe minimal library
torchgpipe
A GPipe implementation in PyTorch
dawn-bench-entries
DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Tiresias
A GPU Cluster Manager for Distributed Deep Learning Training
image-resize
Image resizing benchmark for serverless platforms.