Ramping up to Deep Learning training space. Each exercise covers different technologies we use.
- Familirize with CMake
- MPI Hello World
- Using CPU and GPU memory
- MPI All Reduce
- NCCL All Reduce
- Concurrent Hello World in C++
- SIMD
- Train Multi-layer Perceptron using PyTorch
- Distributed training of neural networks using PyTorch collectives
- Custom DistributedDataParallel class in PyTorch
- Implement your own collective
- Slurm CUDA Streams