microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

Home Page:https://aka.ms/superbench

Repository from Github https://github.commicrosoft/superbenchmarkRepository from Github https://github.commicrosoft/superbenchmark

V0.10.0 Test Plan

yukirora opened this issue · comments

Test Cases

single-node test

Machine Type #Node * #GPU * GPU Type Accelerated Computing Toolkit Status
NDv5 SXM 1* 8 * H100 CUDA12.2 done
AMD MI200 1 * 16 * AMD MI200 ROCM 5.7 done
AMD MI300x 1 * 8 * AMD MI300x ROCM 6.0 done

A100 and H100 related

  1. microbenchmark
  • Bug fix for GPU Burn test (#567)
  • Support INT8 in cublaslt function (#574)
  • Support cpu-gpu and gpu-cpu in ib-validation (#581)
  • Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
  • Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
  • dist-inference cpp (#586)
  • add msccl support (#584)
  • Support in-place for NCCL/RCCL benchmark (#591)
  1. Model Benchmark Improvement
  • Change torch.distributed.launch to torchrun (#556)
  • Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
  1. Superbench improvement
  • Update Docker image for H100 support (#577)

MI200 and MI300x

  1. microbenchmark improvement
  • Add HPL random generator to gemm-flops with ROCm (#578)
  • Update MLC version into 3.10 for CUDA/ROCm dockerfile (#562)
  • Add hipBLASLt function benchmark (#576)
  • Support cpu-gpu and gpu-cpu in ib-validation (#581)
  • Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
  • Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
  • dist-inference cpp (#586)
  • Support in-place for NCCL/RCCL benchmark (#591)
  1. Model Benchmark Improvement
  • Change torch.distributed.launch to torchrun (#556)
  • Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
  1. Superbench improvement
  • Support Monitoring for AMD GPUs (#580)

Result analysis

  • Support baseline generation from multiple nodes (#575)