V0.8.0 Test Plan

Question

V0.8.0 Test Plan

yukirora opened this issue 2 years ago · comments

Test Cases

single-node test

Machine Type	#Node * #GPU * GPU Type	PyTorch Version	Accelerated Computing Toolkit	Status
NDv5 SXM	1* 8 * H100	PyTorch 1.x	CUDA11.8	Done
ND A100 v4/NDm A100 v4	1 * 8 * A100 80GB SXM	PyTorch 1.x	CUDA 11.8
ND A100 v4/NDm A100 v4	1 * 8 * A100 40GB SXM	PyTorch 1.8	CUDA 11.1

Hopper GPU and FP8 related benchmarks

microbenchmark

Add distributed inference benchmark (#493)

Support tensor core precisions (e.g., FP8) and batch/shape range in cublaslt gemm (#492 and #494)

e2e benchmark

Support TE FP8 in BERT/GPT2 models (#496, #499)

SuperBench existing benchmark improvement

microbenchmark improvement

Support flexible warmup and non-random data initialization in cublas-benchmark (Benchmarks: Revision - Support flexible warmup and non-random data initialization in cublas-benchmark #479)

Support error tolerance in micro-benchmark for CuDNN function (#490)

e2e benchmark improvement

Fix torch.dist init issue with multiple models (#495)

CPU benchmark

Add STREAM benchmark for sustainable memory bandwidth and the corresponding computation rate. (#473)

Add HPL Benchmark for HPC Linpack Benchmark. (#482)

SuperBench Improvement

install pipeline

Remove fixed rccl version in rocm5.1.x docker file (#476)

Upgrade networkx version to fix installation compatibility issue (#478)

Pin setuptools version to v65.7.0 (#483)

Limit ansible_runner version for Python3.6 (#485)

monitor

Support cgroup V2 when read system metrics in Monitor

multi-node test

Machine Type	#Node * #GPU * GPU Type	PyTorch Version	Accelerated Computing Toolkit	Status
NDv5 SXM	2* 8 * H100	PyTorch 1.x	CUDA11.8

Hopper GPU and FP8 related benchmarks

microbenchmark

Add distributed inference benchmark (#493)