[Benchmark] Improve NLP Backbone Benchmark
sxjscience opened this issue · comments
Xingjian Shi commented
Description
In GluonNLP, we introduced the benchmarking script in https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks.
The goal is to track the training + inference latency of common NLP backbones so that we can choose the appropriate ones for our task. This will help users train + deploy models with AWS.
Currently, we covered:
- Huggingface/Transformer-based backbone with FP32 + FP16 training / inference. For FP16 training, we are not profiling against the AMP-based solution so this gives an edge of pytorch, in which we need to fix
- MXNet 2.0-nightly version (only for community use) + GluonNLP 1.0 with FP32 + FP16 (amp) training / inference.
- TVM FP32 inference. Due to some recent upgrade of the code base, this is currently broken.
I will share the following action items that I feel are worthwhile doing:
Short-term Bug-fix + Improvement
- Fix the FP16 training benchmark in Huggingface/Transformer to use AMP in PyTorch
- Fix the TVM benchmark. This is also tracked in #1425
- Add FP16 inference to TVM benchmark.
- Turn on einsum acceleration in MXNet-based benchmark. This is added in apache/mxnet#18921
Automation + Visualization
- Support launching benchmark job with AWS Batch. Currently tracked in #1471.
- Automate benchmarking process via Github actions.
- Support visualization of benchmark results
Longer-term Backbone Benchmarking Effort
- Add JAX/flax-based solution, which is internally using XLA.
- Support AutoScheduler in TVM benchmark
- Enable ONNX + TensorRT. This is considered the fastest solution for conducting NLP inference.
Other longer-term efforts
- Support benchmarks for Data-loaders.
- Support common end-to-end training benchmarks like the SQuAD 2.0 finetuning. We may focus on single-instance-based benchmarks.
@dmlc/gluon-nlp-committers