RTL simulation not finishing
mmrahorovic opened this issue · comments
Description
In the 1D SWU implementation (ConvolutionInputGenerator_1D_lowbuffer), RTL simulation does not finish depending on how the code is written. More specifically, the bug can be narrowed down to the following block. If
if(re) {
out.write(buffer[rp]);
if(++offset == WINDOW_SIZE){
offset = 0;
rp += SIMD_COUNT * (Stride_x - 1);
}
if(++rp >= BUFFER_SIZE) rp -= BUFFER_SIZE;
if(++ocnt == WINDOW_SIZE) ocnt = 0;
}
is rewritten as:
if(re) {
out.write(buffer[rp]);
if(++offset == WINDOW_SIZE){
offset = 0;
rp += 1 + SIMD_COUNT * (Stride_x - 1);
if(rp >= BUFFER_SIZE) rp -= BUFFER_SIZE;
}
else{ // Explicit else-block required to work around bug in RTL simulation
if(++rp >= BUFFER_SIZE) rp -= BUFFER_SIZE;
}
if(++ocnt == WINDOW_SIZE) ocnt = 0;
}
the bug is resolved. Both blocks are functionally equivalent, but the latter one introduces additional code overhead.
The main question is: what causes the HLS bug and what would be a cleaner way of resolving it? How to obtain additional debug information?
Reproducing
In order to reproduce the bug:
- Clone the FINN repository.
- Modify the unit test case in <FINN./tests/fpgadataflow/test_fpgadataflow_convinputgenerator1d.py by setting 'ifm_ch' to 1, 'stride' to [2, 1], 'exec_mode' to rtlsim, 'simd' to 1, 'dw' to 0, 'flip' to False, and 'parallel_window' to False.
- Start a docker container
- Modify the corresponding block of code in ConvolutionInputGenerator_1D_lowbuffer in /slidingwindow.h
- Run:
python setup.py test --addopts "-k test_fpgadataflow_slidingwindow_1d"
The observed issue is caused by the migration from Vivado HLS to Vitis HLS (2021.2). Vitis HLS implements, by default, 'stalling pipelines'. This means that the specific (pipelined) function/loop only runs when input data is available and otherwise stalls. This caused the stall in execution in the experiment described above. The reason why a minor reformulation of the code changes whether a stall is observed or not is unclear.
The preferred option is to have a flushable pipeline, which is implemented and merged: PR #91.