Simulation fails when using custom floating point operators

Question

Simulation fails when using custom floating point operators

barClaudio opened this issue a year ago · comments

When using the latest AppImage file, the testbench produces an incorrect result when using the following top function:

float user_fp(float a, float b, float c) { return a * b + c; }
I am launching bambu with the following command:

bambu module.c -O3 -lm --simulate --top-fname=user_fp --fp-format=user_fp*e5m10b-16nih --fp-format-interface --generate-tb="a=3.0,b=4.0,c=5.0" --print-dot

The same example works using the 0.9.8 AppImage file.

Below is the full output of bambu:


==  Bambu executed with: /tmp/.mount_bamburvnfxO/usr/bin/bambu -O3 -lm --simulate --top-fname=user_fp --fp-format=user_fp*e5m10b-16nih --fp-format-interface --generate-tb=a=3.0,b=4.0,c=5.0 --print-dot module.c 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 0.9.8 - Revision 49f79fbbb85dfe05df3a00f3d0c30d753a7fed52-dev/panda

Target technology = FPGA
Function call to __float_mule5m10b_16nih inlined in user_fp
Function call to __float_adde5m10b_16nih inlined in user_fp

  Functions to be synthesized:
    user_fp


  Memory allocation information:
    BRAM bitsize: 8
    Spec may not exploit DATA bus width
    All the data have a known address
    Internal data is not externally accessible
    DATA bus bitsize: 8
    ADDRESS bus bitsize: 5
    SIZE bus bitsize: 4
    ALL pointers have been resolved
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Module allocation information for function user_fp:
    Number of complex operations: 1
    Number of complex operations: 1
  Time to perform module allocation: 0.16 seconds


  Scheduling Information of function user_fp:
    Number of control steps: 6
    Minimum slack: 0.11874264766666909
    Estimated max frequency (MHz): 101.20169572993289
  Time to perform scheduling: 0.16 seconds


  State Transition Graph Information of function user_fp:
    Number of states: 4
    Minimum number of cycles: 4
    Maximum number of cycles 4
  Time to perform creation of STG: 0.09 seconds


  Easy binding information for function user_fp:
    Bound operations:353/456
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function user_fp:
    Number of storage values inserted: 72
  Time to compute storage value information: 0.00 seconds

  Slack computed in 0.01 seconds
  Weight computation completed in 0.01 seconds
  False-loop computation completed in 0.00 seconds

  Register binding information for function user_fp:
    Register allocation algorithm obtains a sub-optimal result: 72 registers(LB:35)
  Time to perform register binding: 0.00 seconds

  Clique covering computation completed in 0.00 seconds

  Module binding information for function user_fp:
    Number of modules instantiated: 456
    Number of performance conflicts: 62
    Estimated resources area (no Muxes and address logic): 4305
    Estimated area of MUX21: 0
    Total estimated area: 4305
    Estimated number of DSPs: 1
  Time to perform module binding: 0.02 seconds


  Register binding information for function user_fp:
    Register allocation algorithm obtains a sub-optimal result: 72 registers(LB:35)
  Time to perform register binding: 0.01 seconds

  Total number of flip-flops in function user_fp: 231
Start reading vector           1's values from input file.

Reading of vector values from input file completed. Simulation started.
 return_port =     0   expected =    18 

Simulation ended after                    4 cycles.

Simulation FAILED

- /content/bambu-tutorial/03-optimizations/Exercise6/HLS_output//simulation/testbench_user_fp_tb.v:482: Verilog $finish
error -> Simulation not correct!

Please report bugs to <[panda-info@polimi.it](mailto:panda-info@polimi.it)>

Michele Fiorito · Answer 1 · Tue Sep 12 2023 15:27:45 GMT+0800 (China Standard Time)

Hi,
When --fp-format-interface is used, the top-level interface is modified to generate values according to the user-defined custom floating-point encoding. The generated testbench does not automatically convert I/O values, so the simulation fails. If you need to test such an implementation, I suggest you write a testbench to convert standard floating-point formats into the kernel encoding and back. You can find an example under examples/truefloat in this repo.