microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

Home Page:https://aka.ms/superbench

Repository from Github https://github.commicrosoft/superbenchmarkRepository from Github https://github.commicrosoft/superbenchmark

V0.8.0 Test Plan

yukirora opened this issue · comments

Test Cases

single-node test

Machine Type #Node * #GPU * GPU Type PyTorch Version Accelerated Computing Toolkit Status
NDv5 SXM 1* 8 * H100 PyTorch 1.x CUDA11.8 Done
ND A100 v4/NDm A100 v4 1 * 8 * A100 80GB SXM PyTorch 1.x CUDA 11.8
ND A100 v4/NDm A100 v4 1 * 8 * A100 40GB SXM PyTorch 1.8 CUDA 11.1

Hopper GPU and FP8 related benchmarks

  1. microbenchmark
  • Add distributed inference benchmark (#493)
  • Support tensor core precisions (e.g., FP8) and batch/shape range in cublaslt gemm (#492 and #494)
  1. e2e benchmark
  • Support TE FP8 in BERT/GPT2 models (#496, #499)

SuperBench existing benchmark improvement

  1. microbenchmark improvement
  • Support flexible warmup and non-random data initialization in cublas-benchmark (Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark #479)
  • Support error tolerance in micro-benchmark for CuDNN function (#490)
  1. e2e benchmark improvement
  • Fix torch.dist init issue with multiple models (#495)

CPU benchmark

  • Add STREAM benchmark for sustainable memory bandwidth and the corresponding computation rate. (#473)
  • Add HPL Benchmark for HPC Linpack Benchmark. (#482)

SuperBench Improvement

  1. install pipeline
  • Remove fixed rccl version in rocm5.1.x docker file (#476)
  • Upgrade networkx version to fix installation compatibility issue (#478)
  • Pin setuptools version to v65.7.0 (#483)
  • Limit ansible_runner version for Python3.6 (#485)
  1. monitor
  • Support cgroup V2 when read system metrics in Monitor

multi-node test

Machine Type #Node * #GPU * GPU Type PyTorch Version Accelerated Computing Toolkit Status
NDv5 SXM 2* 8 * H100 PyTorch 1.x CUDA11.8

Hopper GPU and FP8 related benchmarks

  1. microbenchmark
  • Add distributed inference benchmark (#493)