dmlc / gluon-nlp

NLP made easy

Home Page:https://nlp.gluon.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Benchmark] Improve NLP Backbone Benchmark

sxjscience opened this issue · comments

Description

In GluonNLP, we introduced the benchmarking script in https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks.

The goal is to track the training + inference latency of common NLP backbones so that we can choose the appropriate ones for our task. This will help users train + deploy models with AWS.

Currently, we covered:

  • Huggingface/Transformer-based backbone with FP32 + FP16 training / inference. For FP16 training, we are not profiling against the AMP-based solution so this gives an edge of pytorch, in which we need to fix
  • MXNet 2.0-nightly version (only for community use) + GluonNLP 1.0 with FP32 + FP16 (amp) training / inference.
  • TVM FP32 inference. Due to some recent upgrade of the code base, this is currently broken.

I will share the following action items that I feel are worthwhile doing:

Short-term Bug-fix + Improvement

  • Fix the FP16 training benchmark in Huggingface/Transformer to use AMP in PyTorch
  • Fix the TVM benchmark. This is also tracked in #1425
  • Add FP16 inference to TVM benchmark.
  • Turn on einsum acceleration in MXNet-based benchmark. This is added in apache/mxnet#18921

Automation + Visualization

  • Support launching benchmark job with AWS Batch. Currently tracked in #1471.
  • Automate benchmarking process via Github actions.
  • Support visualization of benchmark results

Longer-term Backbone Benchmarking Effort

  • Add JAX/flax-based solution, which is internally using XLA.
  • Support AutoScheduler in TVM benchmark
  • Enable ONNX + TensorRT. This is considered the fastest solution for conducting NLP inference.

Other longer-term efforts

  • Support benchmarks for Data-loaders.
  • Support common end-to-end training benchmarks like the SQuAD 2.0 finetuning. We may focus on single-instance-based benchmarks.

@dmlc/gluon-nlp-committers