Xilinx / finn-hlslib

Vitis HLS Library for FINN

Home Page:https://xilinx.github.io/finn/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RTL simulation not finishing

mmrahorovic opened this issue · comments

Description

In the 1D SWU implementation (ConvolutionInputGenerator_1D_lowbuffer), RTL simulation does not finish depending on how the code is written. More specifically, the bug can be narrowed down to the following block. If

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
	rp += SIMD_COUNT * (Stride_x - 1);
    }
    if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

is rewritten as:

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
        rp += 1 + SIMD_COUNT * (Stride_x - 1);
	if(rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    else{ // Explicit else-block required to work around bug in RTL simulation
        if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

the bug is resolved. Both blocks are functionally equivalent, but the latter one introduces additional code overhead.
The main question is: what causes the HLS bug and what would be a cleaner way of resolving it? How to obtain additional debug information?

Reproducing

In order to reproduce the bug:

  • Clone the FINN repository.
  • Modify the unit test case in <FINN./tests/fpgadataflow/test_fpgadataflow_convinputgenerator1d.py by setting 'ifm_ch' to 1, 'stride' to [2, 1], 'exec_mode' to rtlsim, 'simd' to 1, 'dw' to 0, 'flip' to False, and 'parallel_window' to False.
  • Start a docker container
  • Modify the corresponding block of code in ConvolutionInputGenerator_1D_lowbuffer in /slidingwindow.h
  • Run: python setup.py test --addopts "-k test_fpgadataflow_slidingwindow_1d"

The observed issue is caused by the migration from Vivado HLS to Vitis HLS (2021.2). Vitis HLS implements, by default, 'stalling pipelines'. This means that the specific (pipelined) function/loop only runs when input data is available and otherwise stalls. This caused the stall in execution in the experiment described above. The reason why a minor reformulation of the code changes whether a stall is observed or not is unclear.

The preferred option is to have a flushable pipeline, which is implemented and merged: PR #91.