baidu-research / DeepBench

Benchmarking Deep Learning operations on different hardware

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why does rnn_bench uses training routines for inference pass?

mattsinc opened this issue · comments

Hi everyone,

I'm attempting to understand the code in the RNN benchmark. Looking at both NVIDIA and AMD's implementations, I see that both are using what appear to be documented as training passes for what I believe to be the inference pass.

For example, for the AMD implementation I see that miopenRNNForwardTraining and miopenRNNBackwardData are used. I thought that miopenRNNForwardTraining was being used for the inference half of the benchmark, and miopenRNNBackwardData was being used for the training half of the benchmark – purely based on context clues from the benchmark (e.g., the !inference call https://github.com/ROCmSoftwarePlatform/DeepBench/blob/master/code/amd/rnn_bench_rocm.cpp#L239 means we’re also doing training too, and miopenRNNBackwardData gets called via that code).

However, according to the AMD documentation, both miopenRNNForwardTraining (https://rocmsoftwareplatform.github.io/MIOpen/doc/html/rnn.html#miopenrnnforwardtraining) and miopenRNNBackwardData (https://rocmsoftwareplatform.github.io/MIOpen/doc/html/rnn.html#miopenrnnbackwarddata) are passes to use when doing training. I also noticed that the NVIDIA implementation appears to do exactly the same thing: https://github.com/baidu-research/DeepBench/blob/master/code/nvidia/rnn_bench.cu#L196, https://github.com/baidu-research/DeepBench/blob/master/code/nvidia/rnn_bench.cu#L221.

So, I was wondering why an inference-only pass wouldn't use something specifically for inference-only passes, e.g., miopenRNNForwardInference (https://rocmsoftwareplatform.github.io/MIOpen/doc/html/rnn.html#miopenrnnforwardinference) or the equivalent cuDNN call? Does DeepBench have a requirement requirement for the backward path that necessitates this approach? @dagamayank , not sure if you know who the right person to ask here is (or if you know the answer)?

Looking through the open issues, I believe this is distinct from #87 .

Thanks,
Matt

@sharannarang : wanted to ping you on this too, in case you know why.

@mattsinc , I think you are correct. We should be using the cudnnRNNForwardInference instead of cudnnRNNForwardTraining for the inference benchmark. For NVIDIA benchmarks, I think I just used the training function without realizing that there may be a performance difference with using the inference function.

From the cudnnRNNForwardInference cudnn docs:

This routine executes the recurrent neural network described by rnnDesc with inputs x, hx, and cx, weights w and outputs y, hy, and cy. workspace is required for intermediate storage. This function does not store intermediate data required for training; cudnnRNNForwardTraining() should be used for that purpose.

So, there will be some overhead in using the training function instead of inference function.

Thanks @sharannarang ! This is exactly what I was seeing/thinking as well. I have tested and pushed a pull request for this change for both NVIDIA and AMD (#117).

Matt