Simulation fails when using custom floating point operators
barClaudio opened this issue · comments
When using the latest AppImage file, the testbench produces an incorrect result when using the following top function:
float user_fp(float a, float b, float c) { return a * b + c; }
I am launching bambu with the following command:
bambu module.c -O3 -lm --simulate --top-fname=user_fp --fp-format=user_fp*e5m10b-16nih --fp-format-interface --generate-tb="a=3.0,b=4.0,c=5.0" --print-dot
The same example works using the 0.9.8 AppImage file.
Below is the full output of bambu:
== Bambu executed with: /tmp/.mount_bamburvnfxO/usr/bin/bambu -O3 -lm --simulate --top-fname=user_fp --fp-format=user_fp*e5m10b-16nih --fp-format-interface --generate-tb=a=3.0,b=4.0,c=5.0 --print-dot module.c
********************************************************************************
____ _
| __ ) __ _ _ __ ___ | |_ _ _
| _ \ / _` | '_ ` _ \| '_ \| | | |
| |_) | (_| | | | | | | |_) | |_| |
|____/ \__,_|_| |_| |_|_.__/ \__,_|
********************************************************************************
High-Level Synthesis Tool
Politecnico di Milano - DEIB
System Architectures Group
********************************************************************************
Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 0.9.8 - Revision 49f79fbbb85dfe05df3a00f3d0c30d753a7fed52-dev/panda
Target technology = FPGA
Function call to __float_mule5m10b_16nih inlined in user_fp
Function call to __float_adde5m10b_16nih inlined in user_fp
Functions to be synthesized:
user_fp
Memory allocation information:
BRAM bitsize: 8
Spec may not exploit DATA bus width
All the data have a known address
Internal data is not externally accessible
DATA bus bitsize: 8
ADDRESS bus bitsize: 5
SIZE bus bitsize: 4
ALL pointers have been resolved
Internally allocated memory (no private memories): 0
Internally allocated memory: 0
Time to perform memory allocation: 0.00 seconds
Module allocation information for function user_fp:
Number of complex operations: 1
Number of complex operations: 1
Time to perform module allocation: 0.16 seconds
Scheduling Information of function user_fp:
Number of control steps: 6
Minimum slack: 0.11874264766666909
Estimated max frequency (MHz): 101.20169572993289
Time to perform scheduling: 0.16 seconds
State Transition Graph Information of function user_fp:
Number of states: 4
Minimum number of cycles: 4
Maximum number of cycles 4
Time to perform creation of STG: 0.09 seconds
Easy binding information for function user_fp:
Bound operations:353/456
Time to perform easy binding: 0.00 seconds
Storage Value Information of function user_fp:
Number of storage values inserted: 72
Time to compute storage value information: 0.00 seconds
Slack computed in 0.01 seconds
Weight computation completed in 0.01 seconds
False-loop computation completed in 0.00 seconds
Register binding information for function user_fp:
Register allocation algorithm obtains a sub-optimal result: 72 registers(LB:35)
Time to perform register binding: 0.00 seconds
Clique covering computation completed in 0.00 seconds
Module binding information for function user_fp:
Number of modules instantiated: 456
Number of performance conflicts: 62
Estimated resources area (no Muxes and address logic): 4305
Estimated area of MUX21: 0
Total estimated area: 4305
Estimated number of DSPs: 1
Time to perform module binding: 0.02 seconds
Register binding information for function user_fp:
Register allocation algorithm obtains a sub-optimal result: 72 registers(LB:35)
Time to perform register binding: 0.01 seconds
Total number of flip-flops in function user_fp: 231
Start reading vector 1's values from input file.
Reading of vector values from input file completed. Simulation started.
return_port = 0 expected = 18
Simulation ended after 4 cycles.
Simulation FAILED
- /content/bambu-tutorial/03-optimizations/Exercise6/HLS_output//simulation/testbench_user_fp_tb.v:482: Verilog $finish
error -> Simulation not correct!
Please report bugs to <[panda-info@polimi.it](mailto:panda-info@polimi.it)>
Hi,
When --fp-format-interface is used, the top-level interface is modified to generate values according to the user-defined custom floating-point encoding. The generated testbench does not automatically convert I/O values, so the simulation fails. If you need to test such an implementation, I suggest you write a testbench to convert standard floating-point formats into the kernel encoding and back. You can find an example under examples/truefloat
in this repo.