RTL simulation not finishing

Question

RTL simulation not finishing

mmrahorovic opened this issue 2 years ago · comments

Description

In the 1D SWU implementation (ConvolutionInputGenerator_1D_lowbuffer), RTL simulation does not finish depending on how the code is written. More specifically, the bug can be narrowed down to the following block. If

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
	rp += SIMD_COUNT * (Stride_x - 1);
    }
    if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

is rewritten as:

if(re) {
    out.write(buffer[rp]);
    if(++offset == WINDOW_SIZE){
        offset = 0;
        rp += 1 + SIMD_COUNT * (Stride_x - 1);
	if(rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    else{ // Explicit else-block required to work around bug in RTL simulation
        if(++rp >= BUFFER_SIZE)  rp -= BUFFER_SIZE;
    }
    if(++ocnt == WINDOW_SIZE)  ocnt = 0;
}

the bug is resolved. Both blocks are functionally equivalent, but the latter one introduces additional code overhead.
The main question is: what causes the HLS bug and what would be a cleaner way of resolving it? How to obtain additional debug information?

Reproducing

In order to reproduce the bug:

Clone the FINN repository.
Modify the unit test case in <FINN./tests/fpgadataflow/test_fpgadataflow_convinputgenerator1d.py by setting 'ifm_ch' to 1, 'stride' to [2, 1], 'exec_mode' to rtlsim, 'simd' to 1, 'dw' to 0, 'flip' to False, and 'parallel_window' to False.
Start a docker container
Modify the corresponding block of code in ConvolutionInputGenerator_1D_lowbuffer in /slidingwindow.h
Run: python setup.py test --addopts "-k test_fpgadataflow_slidingwindow_1d"

Mirza Mrahorovic · Answer 1 · Mon Jun 20 2022 21:40:54 GMT+0800 (China Standard Time)

The observed issue is caused by the migration from Vivado HLS to Vitis HLS (2021.2). Vitis HLS implements, by default, 'stalling pipelines'. This means that the specific (pipelined) function/loop only runs when input data is available and otherwise stalls. This caused the stall in execution in the experiment described above. The reason why a minor reformulation of the code changes whether a stall is observed or not is unclear.

The preferred option is to have a flushable pipeline, which is implemented and merged: PR #91.