microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

Home Page:https://aka.ms/superbench

Repository from Github https://github.commicrosoft/superbenchmarkRepository from Github https://github.commicrosoft/superbenchmark

V0.7.0 Release Plan

cp5555 opened this issue · comments

Release Manager

@cp5555

Endgame

  • Code freeze: Jan. 3rd, 2023
  • Bug Bash date: Jan 13th, 2023
  • Release date: Jan 20th, 2023

Main Features

SuperBench Improvement

    • Support non-zero return code when “sb deploy” or “sb run” fails in Ansible (Related to #410 and #411) (#425)
    • Support log flushing to the result file during runtime (Related to #390) (#445)
    • Update version to include revision hash and date (#427)
    • Support 'pattern' in 'mpi' mode to run tasks in parallel (#430, #458)
    • Support topo-aware, all-pair, and K-batch pattern in 'mpi' mode (#437#447)
    • Fix Transformers version to avoid Tensorrt failure (#441)
    • Add CUDA11.8 Docker image for Nvidia arch90 GPUs (#449)
    • Support sb deploy without docker pulling (#466)

Micro-benchmark Improvement

    • Support list of custom config string in cudnn-functions and cublas-functions (#414)
    • Support correctness check in cublas-functions (#450, #452)
    • Support GEMM-FLOPS for Nvidia arch90 GPUs (#456)
    • Add wait time option to resolve mem-bw unstable issue (#438)
    • Fix bug for incorrect datatype judgement in cublas-function source code. (#462)

Model-benchmark Improvement

    • Support FP8 in Bert model training (#446, #461)

Distributed Benchmark Improvement

    • Support pair-wise pattern in IB validation benchmark. (#453)
    • Support topo-aware, pair-wise, and K-batch pattern in nccl-bw benchmark. (#454)

Backlog

Inference Benchmark Improvement

  1. Support VGG, LSTM, and GPT-2 small in TensorRT Inference Backend
  2. Support VGG, LSTM, and GPT-2 small in ORT Inference Backend
  3. Support more TensorRT parameters (Related to #366)

Document

  1. Metric Reasoning Doc