AXI4-Lite support
sei-jgwohlbier opened this issue · comments
Hi,
Is AXI4-Lite support in a state where it can be used? I tried to enable it with
panda_USE_EXPERIMENTAL=yes ../configure --enable-flopoco --enable-opt --prefix=/opt/panda-exp
but the build fails with
bash: /working_dir/PandA-bambu/etc/macros/../../ext/trng-4.17/configure: No such file or directory
configure: error: "Error in trng configuration"
Maybe I need to be on the feature/AXI
branch? Or maybe I shouldn't be trying to test it.
Thanks.
Hello,
Currently, only the standard AXI4 master interface is available, AXI4-Lite is not there yet.
If you would like to contribute and implement that kind of interface, you are welcome to do so by branching from dev/panda.
Just as a note, the feature/AXI branch is there to fix some issues with the AXI caches, but it is not related to the AXI protocol support itself.
Also, about the experimental flag that you enabled for the compilation, that is an old option that is being removed right now since it does not bring anything new to the tool with respect to the standard, and it is impossible to compile with that (as you discovered yourself).
Thanks. So the AXI4 master interface is available in the standard release, and I don't need to do a special build?
Exactly.
Also, if you like to use the AppImage version, I suggest you download it from Bambu Releases. The latest stable version is bambu-2023.1.AppImage, while I suggest you go directly with bambu-dev-panda.AppImage to avoid some silly issues that have been solved in the latest builds.
Furthermore, the dev/panda build offers a new testbench environment supporting C/C++ testbench implementation.
Ok, thanks. I'm thinking about importing synthesized accelerators into an SoC that has AXI4 master for DMA's but needs either AXI4-Lite or APB for configuration of the accelerator. I can't yet tell if bambu provides this.
With the latest dev/panda version, it is possible to generate a memory-mapped top-level interface passing the --memory-mapped-top option. In this case, the top module will expose a slave memory bus which you may use to initialize the accelerator and start the computation. The available protocols for this interface are the Wishbone B4 protocol and the internal memory bus protocol used by Bambu. The latter is a straightforward protocol that you may adapt to AXI4-Lite if this suits your needs.
Ok, thanks very much. I see the build issue with fileIO.hpp
. I have a patch for it, which I'm sure you also have.
Hi, I tried to add --memory-mapped-top
to the soda-opt pytorch tutorial and I get the following error:
error -> unexpected case (unsigned char) __exp_bits_23853_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:315 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))
Any idea where to start looking?
Thanks!
Hi, can you please provide the input files and the full command line you used to call Bambu?
The source code is from the pytorch tutorial in soda-opt. The only change to the Makefile
that lowers to llvm is the addition of --memory-mapped-top
to the bambu invocation, which results in the following commands.
ToyCNN(
(conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG12 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--generate-tb=../../forward_kernel_test.xml \
--simulate --simulator=VERILATOR \
--top-fname=forward_kernel \
--memory-mapped-top \
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
completed
== Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG12 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --top-fname=forward_kernel --memory-mapped-top ../../../output/05_llvm_baseline.ll
I'm attaching the final output of soda 05_llvm_baseline.ll.
Thanks!
With the latest dev/panda version, it is possible to generate a memory-mapped top-level interface passing the --memory-mapped-top option. In this case, the top module will expose a slave memory bus which you may use to initialize the accelerator and start the computation. The available protocols for this interface are the Wishbone B4 protocol and the internal memory bus protocol used by Bambu. The latter is a straightforward protocol that you may adapt to AXI4-Lite if this suits your needs.
Is there any documentation on this interface?
The source code is from the pytorch tutorial in soda-opt. The only change to the
Makefile
that lowers to llvm is the addition of--memory-mapped-top
to the bambu invocation, which results in the following commands.ToyCNN( (conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (relu): ReLU() (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (fc): Linear(in_features=16, out_features=4, bias=True) ) /opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul" soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr mv forward_kernel_interface.xml ./output/forward_kernel_interface.xml mv forward_kernel_test.xml ./output/forward_kernel_test.xml soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0 test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \ cd ./output/bambu/baseline; \ bambu \ -v3 --print-dot \ -lm --soft-float \ --compiler=I386_CLANG12 \ --device=xc7z020-1clg484-VVD \ --clock-period=5 \ --experimental-setup=BAMBU-BALANCED-MP \ --channels-number=2 \ --memory-allocation-policy=ALL_BRAM \ --disable-function-proxy \ --generate-tb=../../forward_kernel_test.xml \ --simulate --simulator=VERILATOR \ --top-fname=forward_kernel \ --memory-mapped-top \ ../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log completed == Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG12 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --top-fname=forward_kernel --memory-mapped-top ../../../output/05_llvm_baseline.ll
I'm attaching the final output of soda 05_llvm_baseline.ll.
Thanks!
I tried to perform the synthesis with the provided command line and input description, but I had no issues with that. Which version of bambu are you using? I tried that with this AppImage.
diff --git a/src/utility/fileIO.hpp b/src/utility/fileIO.hpp
index 43cb3d010..5058e9572 100644
--- a/src/utility/fileIO.hpp
+++ b/src/utility/fileIO.hpp
@@ -321,7 +321,7 @@ inline void CopyFile(boost::filesystem::path file_source, boost::filesystem::pat
}
else
{
- boost::filesystem::copy_file(file_source, file_target, boost::filesystem::copy_options::overwrite_existing);
+ boost::filesystem::copy_file(file_source, file_target, boost::filesystem::copy_option::overwrite_if_exists);
}
}
With the latest dev/panda version, it is possible to generate a memory-mapped top-level interface passing the --memory-mapped-top option. In this case, the top module will expose a slave memory bus which you may use to initialize the accelerator and start the computation. The available protocols for this interface are the Wishbone B4 protocol and the internal memory bus protocol used by Bambu. The latter is a straightforward protocol that you may adapt to AXI4-Lite if this suits your needs.
Is there any documentation on this interface?
The minimal interface needs to be documented elsewhere; I will add something to the wiki as soon as possible. Until then, here is a short description that may be helpful to you.
The default Bambu memory interface, the minimal interface, comprises seven signals: two inputs and five outputs. The default configuration yields a pipelined memory interface. Thus, each request will be asserted for a signal cycle. A non-pipelined version is also available and may be enabled through a memory controller module parameter.
Read and write requests are asserted using Mout_oe_ram and Mout_we_ram signals, respectively. The memory address is passed using the Mout_addr_ram bus, along with data bitsize (Mout_data_ram_size), and writes data (Mout_Wdata_ram) for write requests. The M_DataRdy input signal is expected to be asserted when read/write requests have been completed by the slave. The M_Rdata_ram bus must contain the read data in the same cycle the M_DataRdy is asserted.
Here is an example of the expected waveform for write and read transactions. The write transaction is completed in the same cycle it is issued in, and the read transaction is completed in the next cycle after the Mout_oe_ram signal has been asserted.
I built the dev/panda branch. I had to apply the following patch to get the code to compile. Hopefully this isn't the cause of the error.
I do not think that the patch is causing any issues.
Can you please add the --no-clean option to your command line and share the panda-temp/<input_filename>.gimplePSSA file that is generated? That is the IR dump generated starting from the frontend compiler IR (the clang-12 compiler in your case).
Also, can you share the clang-12 version string (the output of "clang-12 --version" command)?
Thanks for the reply!
$ clang-12 --version
Ubuntu clang version 12.0.0-3ubuntu1~20.04.5
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
I didn't mention previously that I have to edit the IR that comes out of soda-opt. Without editing it, bambu fails to ingest it, with this error.
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/../../../output/05_llvm_baseline.ll:1327:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
^
1 error generated.
Error in compilation
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/../../../output/05_llvm_baseline.ll:1327:57: error: unterminated attribute group
attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
^
1 error generated.
error -> Front-end compiler returns an error during compilation 2
I remove the memory(argmem: readwrite)
part of the attribute to get to the IR file that I sent previously. Using the resulting ll
file I get the following output and the gimplePSSA.zip file attached.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
WARNING: this target does not support the llvm.stacksave intrinsic.
!! Unknown ext. calls:
memrefCopy
1 warning generated.
(in-process) /usr/local/include /usr/lib/llvm-12/lib/clang/12.0.0/include /usr/include/x86_64-linux-gnu /usr/include warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
WARNING: this target does not support the llvm.stacksave intrinsic.
!! Unknown ext. calls:
memrefCopy
1 warning generated.
error -> unexpected case (unsigned char) __exp_bits_27944_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:315 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))
Please report bugs to <panda-info@polimi.it>
I am going to check the .gimplePSSA as soon as I can. In the meanwhile, just a note on the soda-generated IR, it may have been computed using a newer version of the LLVM toolchain with respect to the Clang 12 version, thus this may cause issues with the parser. Bambu also supports Clang 16 as a fronted, maybe you can avoid the .ll editing if you use that one as a frontend compiler.
Ok, thanks. I am working from the latest soda provided docker image, but I can go ahead and build soda and bambu with 16.
The dev-panda AppImage is shipped with Clang 16 too, if you want to avoid the build.
Ok, thanks. Since I have to rebuild soda it's not much more trouble to build everything.
I managed to use the appimage bambu with clang16 specified so that I don't need to edit the IR. It fails differently than it previously did. Below is the whole output.
make synth-baseline
python torchscript.py output/01_tosa.mlir --dialect=tosa
ToyCNN(
(conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--generate-tb=../../forward_kernel_test.xml \
--simulate --simulator=VERILATOR \
--top-fname=forward_kernel \
--memory-mapped-top \
--no-clean \
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
== Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --top-fname=forward_kernel --memory-mapped-top --no-clean ../../../output/05_llvm_baseline.ll
********************************************************************************
____ _
| __ ) __ _ _ __ ___ | |_ _ _
| _ \ / _` | '_ ` _ \| '_ \| | | |
| |_) | (_| | | | | | | |_) | |_| |
|____/ \__,_|_| |_| |_|_.__/ \__,_|
********************************************************************************
High-Level Synthesis Tool
Politecnico di Milano - DEIB
System Architectures Group
********************************************************************************
Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.06 - Revision 8dad23e15331c7737e7969dfa4a4f652d043934f-dev/panda
Parameters parsed in 0.08 seconds
Target technology = FPGA
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 10
- combinational: 0
- others: 10
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_FU
Total cells : 8
- combinational: 0
- others: 8
Library Name : STD_FU
Total cells : 56
- combinational: 0
- others: 56
Library Name : STD_FU
Total cells : 1
- combinational: 0
- others: 1
Library Name : CS_COMPONENT
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 0
- combinational: 0
- others: 0
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 21
- combinational: 0
- others: 21
Library Name : STD
Total cells : 14
- combinational: 0
- others: 14
Library Name : STD_COMMON
Total cells : 57
- combinational: 0
- others: 57
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_PC
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_SOFT_FLOAT
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD
Total cells : 95
- combinational: 0
- others: 95
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 9
- combinational: 0
- others: 9
Library Name : WBWrapper
Total cells : 12
- combinational: 0
- others: 12
Available devices:
- 5CSEMA5F31C6
- 5SGXEA7N2F45C1
- EP2C70F896C6
- EP2C70F896C6-R
- EP4SGX530KH40C2
- LFE335EA8FN484C
- LFE5U85F8BG756C
- LFE5UM85F8BG756C
- asap7-BC
- asap7-TC
- asap7-WC
- nangate45
- nx1h140tsp
- nx1h35S
- nx2h540tsc
- xc4vlx100-10ff1513
- xc5vlx110t-1ff1136
- xc5vlx330t-2ff1738
- xc5vlx50-3ff1153
- xc6vlx240t-1ff1156
- xc7a100t-1csg324-VVD
- xc7vx330t-1ffg1157
- xc7vx485t-2ffg1761-VVD
- xc7vx690t-3ffg1930-VVD
- xc7z020-1clg484
- xc7z020-1clg484-VVD
- xc7z020-1clg484-YOSYS-VVD
- xc7z045-2ffg900-VVD
- xcku060-3ffva1156-VVD
- xcu280-2Lfsvh2892-VVD
Library Name : STD_FU
Total cells : 3931
- combinational: 0
- others: 3931
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
(in-process) /usr/local/include /usr/include/x86_64-linux-gnu /usr/include warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: plus_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: cond_expr optimized, nbits = 3
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: cond_expr optimized, nbits = 3
Bit Value Opt: cond_expr optimized, nbits = 4
Bit Value Opt: cond_expr optimized, nbits = 5
Bit Value Opt: bit_and_expr optimized, nbits = 1
Bit Value Opt: ne_expr optimized, nbits = 1
Bit Value Opt: bit_xor_expr optimized, nbits = 2
Bit Value Opt: bit_and_expr optimized, nbits = 2
Bit Value Opt: ne_expr optimized, nbits = 2
Bit Value Opt: plus_expr optimized, nbits = 2
Bit Value Opt: bit_and_expr optimized, nbits = 11
Bit Value Opt: eq_expr optimized, nbits = 11
Bit Value Opt: bit_and_expr optimized, nbits = 19
Bit Value Opt: eq_expr optimized, nbits = 19
Bit Value Opt: bit_and_expr optimized, nbits = 23
Bit Value Opt: eq_expr optimized, nbits = 23
Bit Value Opt: bit_and_expr optimized, nbits = 25
Bit Value Opt: eq_expr optimized, nbits = 25
Bit Value Opt: bit_and_expr optimized, nbits = 26
Bit Value Opt: eq_expr optimized, nbits = 26
Bit Value Opt: bit_and_expr optimized, nbits = 26
Bit Value Opt: ne_expr optimized, nbits = 26
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: bit_and_expr optimized, nbits = 1
Bit Value Opt: ne_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: bit_and_expr optimized, nbits = 22
Bit Value Opt: ne_expr optimized, nbits = 22
Bit Value Opt: bit_and_expr optimized, nbits = 47
Bit Value Opt: ne_expr optimized, nbits = 47
Bit Value Opt: bit_and_expr optimized, nbits = 9
Bit Value Opt: ne_expr optimized, nbits = 9
Bit Value Opt: bit_and_expr optimized, nbits = 32
Bit Value Opt: ne_expr optimized, nbits = 32
Bit Value Opt: plus_expr optimized, nbits = 4
Bit Value Opt: plus_expr optimized, nbits = 4
Bit Value Opt: plus_expr optimized, nbits = 4
Memory allocation information:
Sparse memory alignemnt set to 1024 bytes
Function: forward_kernel
Id: 495627
Base Address: 1024
Size: 1
Parameter P0 of Function forward_kernel
Id: 495628
Base Address: 1040
Size: 4
Parameter P1 of Function forward_kernel
Id: 495629
Base Address: 1056
Size: 4
Parameter P2 of Function forward_kernel
Id: 495630
Base Address: 1072
Size: 4
Warning: This function uses unknown addresses: forward_kernel
BRAM bitsize: 16
Spec may not exploit DATA bus width
Spec accesses data having an address unknown at compile time
Internal data is not externally accessible
DATA bus bitsize: 32
ADDRESS bus bitsize: 32
SIZE bus bitsize: 6
Total amount of memory allocated for memory mapped parameters: 1024
Internally allocated memory (no private memories): 1024
Internally allocated memory: 1024
Time to perform memory allocation: 0.00 seconds
Module allocation information for function __float_adde8m23b_127nih:
Number of complex operations: 0
Number of complex operations: 0
Time to perform module allocation: 0.05 seconds
Module allocation information for function __float_mule8m23b_127nih:
Number of complex operations: 1
Number of complex operations: 1
Time to perform module allocation: 0.02 seconds
Scheduling Information of function __float_adde8m23b_127nih:
Number of control steps: 9
Minimum slack: 0.010964990999999147
Estimated max frequency (MHz): 200.43956360218834
Time to perform scheduling: 0.03 seconds
Number of function call sites = 19
State Transition Graph Information of function __float_adde8m23b_127nih:
Number of operations: 257
Number of basic blocks: 3
Number of states: 8
Minimum number of cycles: 8
Maximum number of cycles 8
Parameters are registered
Done port is registered
Time to perform creation of STG: 0.01 seconds
Scheduling Information of function __float_mule8m23b_127nih:
Number of control steps: 8
Minimum slack: 0.056999993999999221
Estimated max frequency (MHz): 202.30629148010561
Time to perform scheduling: 0.01 seconds
Number of function call sites = 19
State Transition Graph Information of function __float_mule8m23b_127nih:
Number of operations: 104
Number of basic blocks: 3
Number of states: 7
Minimum number of cycles: 7
Maximum number of cycles 7
Parameters are registered
Done port is registered
Time to perform creation of STG: 0.01 seconds
Easy binding information for function __float_adde8m23b_127nih:
Bound operations:192/257
Time to perform easy binding: 0.00 seconds
Easy binding information for function __float_mule8m23b_127nih:
Bound operations:85/104
Time to perform easy binding: 0.01 seconds
Storage Value Information of function __float_adde8m23b_127nih:
Number of storage values inserted: 89
Time to compute storage value information: 0.00 seconds
Storage Value Information of function __float_mule8m23b_127nih:
Number of storage values inserted: 16
Time to compute storage value information: 0.00 seconds
Slack computed in 0.00 seconds
Weight computation completed in 0.00 seconds
False-loop computation completed in 0.00 seconds
Iteration 0 completed in 0.00 seconds
Register binding information for function __float_adde8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
Time to perform register binding: 0.01 seconds
Iteration 1 completed in 0.00 seconds
Clique covering computation completed in 0.01 seconds
Module binding information for function __float_adde8m23b_127nih:
Number of modules instantiated: 257
Number of performance conflicts: 13
Estimated resources area (no Muxes and address logic): 2746
Estimated area of MUX21: 0
Total estimated area: 2746
Estimated number of DSPs: 0
Time to perform module binding: 0.01 seconds
Register binding information for function __float_adde8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
Time to perform register binding: 0.00 seconds
Total number of flip-flops in function __float_adde8m23b_127nih: 488
Slack computed in 0.00 seconds
Weight computation completed in 0.00 seconds
False-loop computation completed in 0.00 seconds
Iteration 0 completed in 0.00 seconds
Register binding information for function __float_mule8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
Time to perform register binding: 0.00 seconds
Iteration 1 completed in 0.00 seconds
Clique covering computation completed in 0.00 seconds
Module binding information for function __float_mule8m23b_127nih:
Number of modules instantiated: 104
Number of performance conflicts: 0
Estimated resources area (no Muxes and address logic): 1100
Estimated area of MUX21: 0
Total estimated area: 1100
Estimated number of DSPs: 3
Time to perform module binding: 0.00 seconds
Register binding information for function __float_mule8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
Time to perform register binding: 0.00 seconds
Total number of flip-flops in function __float_mule8m23b_127nih: 197
Module allocation information for function forward_kernel:
Number of complex operations: 99
Number of complex operations: 99
Time to perform module allocation: 0.04 seconds
Scheduling Information of function forward_kernel:
Number of control steps: 353
Minimum slack: 0.010964988999952796
Estimated max frequency (MHz): 200.43956352183443
Time to perform scheduling: 0.05 seconds
State Transition Graph Information of function forward_kernel:
Number of operations: 428
Number of basic blocks: 10
Number of states: 353
Done port is registered
Time to perform creation of STG: 0.21 seconds
Easy binding information for function forward_kernel:
Bound operations:243/428
Time to perform easy binding: 0.00 seconds
Storage Value Information of function forward_kernel:
Number of storage values inserted: 157
Time to compute storage value information: 0.00 seconds
Slack computed in 0.00 seconds
Weight computation completed in 0.02 seconds
False-loop computation completed in 0.00 seconds
cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
Iteration 0 completed in 0.01 seconds
Register binding information for function forward_kernel:
Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
Time to perform register binding: 0.02 seconds
cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
Iteration 1 completed in 0.01 seconds
Clique covering computation completed in 0.04 seconds
Module binding information for function forward_kernel:
Number of modules instantiated: 333
Number of performance conflicts: 147
Estimated resources area (no Muxes and address logic): 5699
Estimated area of MUX21: 1332.3333333333333
Total estimated area: 7031.333333333333
Estimated number of DSPs: 0
Time to perform module binding: 0.06 seconds
Register binding information for function forward_kernel:
Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
Time to perform register binding: 0.02 seconds
Connection Binding Information for function forward_kernel:
Number of allocated multiplexers (2-to-1 equivalent): 141
Total number of bit-level multiplexers: 4640
Time to perform interconnection binding: 0.01 seconds
Total number of flip-flops in function forward_kernel: 4767
C-based testbench generation for function forward_kernel: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output//simulation/cosim.c
Prepared testbench
error -> BOOL only supports single bit values: 2 - bambu_testbench_impl/master_P0/S_oe_ram (new_bit_size == 1)
Please report bugs to <panda-info@polimi.it>
Can you point me to some verilog that shows an interface coming from use of the --memory-mapped-top
option?
Hi, I had the chance to debug the issue, and I can tell that it is only related to the testbench generation.
I manually tested the interface, which works fine with the proper testbench configuration. Thus I will fix the issue and push the changes to the dev/panda branch.
Until then, if you want to run the simulation to verify the generated design, I suggest you remove the --memory-mapped-top parameter and the testbench generation should work correctly. On the other side, to generate the memory-mapped top design, you should remove the --generate-tb and --simulate options to avoid the testbench being generated.
Ok, thanks, I'll try this. I notice that when I try to compile the test benches I get errors of the following form. I expect it is because I am including the AppImage into the soda-opt container which is ubuntu 20.04, whereas I see some paths in the AppImage that indicate 18.04. So I think there is some include path inconsistencies.
However, I was able to use the --memory-mapped-top
option to get a verilog file. I see the data paths are 64 bits wide. I tried using --data-bus-bitsize
and --addr-bus-bitsize
to set them to 32 but that did not work.
Thanks again!
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang-16: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang-16: warning: argument unused during compilation: '-I /usr/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
In file included from /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output//simulation/cosim.c:20:
/usr/include/stdio.h:33:10: fatal error: 'stddef.h' file not found
#include <stddef.h>
^~~~~~~~~~
1 error generated.
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang-16: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang-16: warning: argument unused during compilation: '-I /usr/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-unknown-linux-gnu [-Woverride-module]
1 warning generated.
clang-16: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang-16: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
In file included from /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output//simulation/cosim.c:20:
/usr/include/stdio.h:33:10: fatal error: 'stddef.h' file not found
#include <stddef.h>
^~~~~~~~~~
1 error generated.
error -> Returned error code!
Please report bugs to <panda-info@polimi.it>
Using the dev/panda
branch results in a different error for the torchscript.py
reproducer. IR attached.
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel \
\
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
== Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel ../../../output/05_llvm_baseline.ll
********************************************************************************
____ _
| __ ) __ _ _ __ ___ | |_ _ _
| _ \ / _` | '_ ` _ \| '_ \| | | |
| |_) | (_| | | | | | | |_) | |_| |
|____/ \__,_|_| |_| |_|_.__/ \__,_|
********************************************************************************
High-Level Synthesis Tool
Politecnico di Milano - DEIB
System Architectures Group
********************************************************************************
Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision 7fb2f6c62adef935ec7eed79f3fd2365f5a0bdbe-dev/panda
Parameters parsed in 0.07 seconds
Target technology = FPGA
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 10
- combinational: 0
- others: 10
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_FU
Total cells : 8
- combinational: 0
- others: 8
Library Name : STD_FU
Total cells : 56
- combinational: 0
- others: 56
Library Name : STD_FU
Total cells : 1
- combinational: 0
- others: 1
Library Name : CS_COMPONENT
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 0
- combinational: 0
- others: 0
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 21
- combinational: 0
- others: 21
Library Name : STD
Total cells : 14
- combinational: 0
- others: 14
Library Name : STD_COMMON
Total cells : 57
- combinational: 0
- others: 57
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_PC
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_SOFT_FLOAT
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD
Total cells : 95
- combinational: 0
- others: 95
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 9
- combinational: 0
- others: 9
Library Name : WBWrapper
Total cells : 12
- combinational: 0
- others: 12
Available devices:
- 5CSEMA5F31C6
- 5SGXEA7N2F45C1
- EP2C70F896C6
- EP2C70F896C6-R
- EP4SGX530KH40C2
- LFE335EA8FN484C
- LFE5U85F8BG756C
- LFE5UM85F8BG756C
- asap7-BC
- asap7-TC
- asap7-WC
- nangate45
- nx1h140tsp
- nx1h35S
- nx2h540tsc
- xc4vlx100-10ff1513
- xc5vlx110t-1ff1136
- xc5vlx330t-2ff1738
- xc5vlx50-3ff1153
- xc6vlx240t-1ff1156
- xc7a100t-1csg324-VVD
- xc7vx330t-1ffg1157
- xc7vx485t-2ffg1761-VVD
- xc7vx690t-3ffg1930-VVD
- xc7z020-1clg484
- xc7z020-1clg484-VVD
- xc7z020-1clg484-YOSYS-VVD
- xc7z045-2ffg900-VVD
- xcku060-3ffva1156-VVD
- xcu280-2Lfsvh2892-VVD
Library Name : STD_FU
Total cells : 3931
- combinational: 0
- others: 3931
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
(in-process) /usr/lib/llvm-16/lib/clang/16/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
error -> unexpected case (unsigned char) __exp_bits_24767_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:314 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))
Please report bugs to <panda-info@polimi.it>
Using the
dev/panda
branch results in a different error for thetorchscript.py
reproducer. IR attached.bambu \ -v3 --print-dot \ -lm --soft-float \ --compiler=I386_CLANG16 \ --device=xc7z020-1clg484-VVD \ --clock-period=5 \ --experimental-setup=BAMBU-BALANCED-MP \ --channels-number=2 \ --memory-allocation-policy=ALL_BRAM \ --disable-function-proxy \ --top-fname=forward_kernel \ \ ../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log == Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel ../../../output/05_llvm_baseline.ll ******************************************************************************** ____ _ | __ ) __ _ _ __ ___ | |_ _ _ | _ \ / _` | '_ ` _ \| '_ \| | | | | |_) | (_| | | | | | | |_) | |_| | |____/ \__,_|_| |_| |_|_.__/ \__,_| ******************************************************************************** High-Level Synthesis Tool Politecnico di Milano - DEIB System Architectures Group ******************************************************************************** Copyright (C) 2004-2023 Politecnico di Milano Version: PandA 2023.08 - Revision 7fb2f6c62adef935ec7eed79f3fd2365f5a0bdbe-dev/panda Parameters parsed in 0.07 seconds Target technology = FPGA Library Name : STD_FU Total cells : 3 - combinational: 0 - others: 3 Library Name : STD_FU Total cells : 10 - combinational: 0 - others: 10 Library Name : STD_FU Total cells : 33 - combinational: 0 - others: 33 Library Name : STD_FU Total cells : 8 - combinational: 0 - others: 8 Library Name : STD_FU Total cells : 56 - combinational: 0 - others: 56 Library Name : STD_FU Total cells : 1 - combinational: 0 - others: 1 Library Name : CS_COMPONENT Total cells : 16 - combinational: 0 - others: 16 Library Name : STD_FU Total cells : 2 - combinational: 0 - others: 2 Library Name : STD_FU Total cells : 0 - combinational: 0 - others: 0 Library Name : STD_FU Total cells : 3 - combinational: 0 - others: 3 Library Name : STD_FU Total cells : 21 - combinational: 0 - others: 21 Library Name : STD Total cells : 14 - combinational: 0 - others: 14 Library Name : STD_COMMON Total cells : 57 - combinational: 0 - others: 57 Library Name : STD_FU Total cells : 33 - combinational: 0 - others: 33 Library Name : STD_PC Total cells : 16 - combinational: 0 - others: 16 Library Name : STD_SOFT_FLOAT Total cells : 2 - combinational: 0 - others: 2 Library Name : STD Total cells : 95 - combinational: 0 - others: 95 Library Name : STD_FU Total cells : 2 - combinational: 0 - others: 2 Library Name : STD_FU Total cells : 9 - combinational: 0 - others: 9 Library Name : WBWrapper Total cells : 12 - combinational: 0 - others: 12 Available devices: - 5CSEMA5F31C6 - 5SGXEA7N2F45C1 - EP2C70F896C6 - EP2C70F896C6-R - EP4SGX530KH40C2 - LFE335EA8FN484C - LFE5U85F8BG756C - LFE5UM85F8BG756C - asap7-BC - asap7-TC - asap7-WC - nangate45 - nx1h140tsp - nx1h35S - nx2h540tsc - xc4vlx100-10ff1513 - xc5vlx110t-1ff1136 - xc5vlx330t-2ff1738 - xc5vlx50-3ff1153 - xc6vlx240t-1ff1156 - xc7a100t-1csg324-VVD - xc7vx330t-1ffg1157 - xc7vx485t-2ffg1761-VVD - xc7vx690t-3ffg1930-VVD - xc7z020-1clg484 - xc7z020-1clg484-VVD - xc7z020-1clg484-YOSYS-VVD - xc7z045-2ffg900-VVD - xcku060-3ffva1156-VVD - xcu280-2Lfsvh2892-VVD Library Name : STD_FU Total cells : 3931 - combinational: 0 - others: 3931 warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module] 1 warning generated. warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module] 1 warning generated. (in-process) /usr/lib/llvm-16/lib/clang/16/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module] 1 warning generated. warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module] 1 warning generated. error -> unexpected case (unsigned char) __exp_bits_24767_[2] 8unsigned char old-bw=4 new-bw=4059 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:314 (tree_helper::Size(old_val) >= tree_helper::Size(new_val)) Please report bugs to <panda-info@polimi.it>
I tried on my local version with the latest dev/pand and I'm able to synthesize the code. Since the hash of the branch is the same, I would like to understand which version of clang you are using. I'm using the binaries downloaded from the github release page of Clang: https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.0/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz
Ok, thanks, I'll try this. I notice that when I try to compile the test benches I get errors of the following form. I expect it is because I am including the AppImage into the soda-opt container which is ubuntu 20.04, whereas I see some paths in the AppImage that indicate 18.04. So I think there is some include path inconsistencies.
The references to 18.04 are actually to the clang binaries we include in the binary distribution.
However, I was able to use the --memory-mapped-top option to get a verilog file. I see the data paths are 64 bits wide. I tried using --data-bus-bitsize and --addr-bus-bitsize to set them to 32 but that did not work.
Bambu can manage IR generated by clang with different intel target architectures. The default one is the 32-bit architecture where -m32 is passed at clang. We added support also to -mx32 and -m64 but in the last case, the address space is 64 bits. So, in order to control the size of the address bus we added a bambu option but control the number of bits used by the minimal interface bus.
In your case, the -m32 is used and the address as well as the data bus size is 32.
The following signals are declared as 64 bits:
input [63:0] M_Rdata_ram;
input [63:0] S_addr_ram;
input [63:0] S_Wdata_ram;
output [63:0] Mout_addr_ram;
output [63:0] Mout_Wdata_ram;
output [63:0] Sout_Rdata_ram;
since you asked for a multi-channel minimal interface. The minimal interface may have one or two channels. Two channels means that you may have two on-flight memory transactions.
Calling Bambu with:
bambu -v3 --print-dot -lm \
--soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel 05_llvm_baseline.ll \
--memory-mapped-top
you will ask for a single channel minimal interface bus and so you will have this top-level interface:
input clock;
input reset;
input [31:0] M_Rdata_ram;
input M_DataRdy;
input S_oe_ram;
input S_we_ram;
input [31:0] S_addr_ram;
input [31:0] S_Wdata_ram;
input [5:0] S_data_ram_size;
// OUT
output done_port;
output Mout_oe_ram;
output Mout_we_ram;
output [31:0] Mout_addr_ram;
output [31:0] Mout_Wdata_ram;
output [5:0] Mout_data_ram_size;
output [31:0] Sout_Rdata_ram;
output Sout_DataRdy;
that should be what you actually expected. Isn't it?
I tried on my local version with the latest dev/pand and I'm able to synthesize the code. Since the hash of the branch is the same, I would like to understand which version of clang you are using. I'm using the binaries downloaded from the github release page of Clang: https://github.com/llvm/llvm-project/releases/download/llvmorg-16.0.0/clang+llvm-16.0.0-x86_64-linux-gnu-ubuntu-18.04.tar.xz
I am using the clang 16 release on Ubuntu 20.04. Here are some steps I use to install it in my docker image.
RUN echo "deb http://apt.llvm.org/focal/ llvm-toolchain-focal-16 main" >> /etc/apt/sources.list && \
echo "deb-src http://apt.llvm.org/focal/ llvm-toolchain-focal-16 main" >> /etc/apt/sources.list && \
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 15CF4D18AF4F7421 && \
apt update && \
apt-get install -y \
clang-16 \
libclang-16-dev
RUN update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-16 100 && \
update-alternatives --install /usr/bin/clang clang /usr/bin/clang-16 100 && \
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 100 && \
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 100
ENV CC=/usr/bin/clang
ENV CXX=/usr/bin/clang++
RUN rm -rf /opt/panda && \
git clone https://github.com/ferrandi/PandA-bambu.git && \
cd PandA-bambu && \
git checkout dev/panda && \
make -f Makefile.init && \
mkdir obj && \
cd obj && \
../configure --enable-flopoco --enable-opt --prefix=/opt/panda --enable-release && \
make -j4 && \
make install
Configuring with --enable-release implies passing to the compiler -DNDEBUG. This hides some errors leaving the error catch only to the THROW_ASSERTS. One way to improve the tracking of the bug could be to configure with --disable-release instead of --enable-release and see where the issues pop out.
A simpler way to track the error is to share the file panda-temp/05_llvm_baseline.ll.gimplePSSA and see if this file allows me to understand where the issue is. You have to call Bambu passing the --no-clean option.
I rebuilt with --disable-release
. Output is below, including soda-opt
processing. LLVM IR and gimple file attached.
python torchscript.py output/01_tosa.mlir --dialect=tosa
ToyCNN(
(conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel \
--no-clean \
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
== Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --no-clean ../../../output/05_llvm_baseline.ll
********************************************************************************
____ _
| __ ) __ _ _ __ ___ | |_ _ _
| _ \ / _` | '_ ` _ \| '_ \| | | |
| |_) | (_| | | | | | | |_) | |_| |
|____/ \__,_|_| |_| |_|_.__/ \__,_|
********************************************************************************
High-Level Synthesis Tool
Politecnico di Milano - DEIB
System Architectures Group
********************************************************************************
Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision 7fb2f6c62adef935ec7eed79f3fd2365f5a0bdbe-dev/panda
Parameters parsed in 0.11 seconds
Target technology = FPGA
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 10
- combinational: 0
- others: 10
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_FU
Total cells : 8
- combinational: 0
- others: 8
Library Name : STD_FU
Total cells : 56
- combinational: 0
- others: 56
Library Name : STD_FU
Total cells : 1
- combinational: 0
- others: 1
Library Name : CS_COMPONENT
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 0
- combinational: 0
- others: 0
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 21
- combinational: 0
- others: 21
Library Name : STD
Total cells : 14
- combinational: 0
- others: 14
Library Name : STD_COMMON
Total cells : 57
- combinational: 0
- others: 57
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_PC
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_SOFT_FLOAT
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD
Total cells : 95
- combinational: 0
- others: 95
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 9
- combinational: 0
- others: 9
Library Name : WBWrapper
Total cells : 12
- combinational: 0
- others: 12
Available devices:
- 5CSEMA5F31C6
- 5SGXEA7N2F45C1
- EP2C70F896C6
- EP2C70F896C6-R
- EP4SGX530KH40C2
- LFE335EA8FN484C
- LFE5U85F8BG756C
- LFE5UM85F8BG756C
- asap7-BC
- asap7-TC
- asap7-WC
- nangate45
- nx1h140tsp
- nx1h35S
- nx2h540tsc
- xc4vlx100-10ff1513
- xc5vlx110t-1ff1136
- xc5vlx330t-2ff1738
- xc5vlx50-3ff1153
- xc6vlx240t-1ff1156
- xc7a100t-1csg324-VVD
- xc7vx330t-1ffg1157
- xc7vx485t-2ffg1761-VVD
- xc7vx690t-3ffg1930-VVD
- xc7z020-1clg484
- xc7z020-1clg484-VVD
- xc7z020-1clg484-YOSYS-VVD
- xc7z045-2ffg900-VVD
- xcku060-3ffva1156-VVD
- xcu280-2Lfsvh2892-VVD
Library Name : STD_FU
Total cells : 3931
- combinational: 0
- others: 3931
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.02 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.03 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Tree merging time: 0.12 seconds;
(in-process) /usr/lib/llvm-16/lib/clang/16/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include
error -> unexpected case (unsigned char) __exp_bits_24767_[2] 8unsigned char old-bw=4 new-bw=4015 from ../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:314 (tree_helper::Size(old_val) >= tree_helper::Size(new_val))
void Bit_Value_opt::propagateValue(const ssa_name *, tree_managerRef, tree_nodeRef, tree_nodeRef, const std::string)
../../src/frontend_analysis/IR_analysis/Bit_Value_opt.cpp:270
Please report bugs to <panda-info@polimi.it>
The error you see seems to be due to a non-detected buffer overflow during the minimum bit computation. I cannot reproduce the issue, so please let me know if #206 fixed the problem.
That works using the --memory-mapped-top
option! Thanks!
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel \
--memory-mapped-top --no-clean \
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
It fails when using --generate-tb --simulate --simulator=VERILATOR
options, as below. I will work on creating a reproducer, but it will take a few days for me to get permission from my organization to get the code and the Dockerfile
published.
python torchscript.py output/01_tosa.mlir --dialect=tosa
ToyCNN(
(conv1): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(relu): ReLU()
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(fc): Linear(in_features=16, out_features=4, bias=True)
)
/opt/soda/scripts/tosa_to_linalg.sh output/01_tosa.mlir output/02_linalg.mlir
soda-opt output/02_linalg.mlir -o output/03-01_linalg_searched.mlir -convert-operation-to-soda="anchor-op=linalg.batch_matmul"
soda-opt output/03-01_linalg_searched.mlir -o output/03-02_linalg_outlined.mlir -soda-outline-bambu-code -soda-extract-arguments-to-xml=using-bare-ptr
mv forward_kernel_interface.xml ./output/forward_kernel_interface.xml
mv forward_kernel_test.xml ./output/forward_kernel_test.xml
soda-opt output/03-02_linalg_outlined.mlir -o output/03-03_linalg_isolated.mlir -soda-generate-bambu-accelcode=no-aa
soda-opt output/03-03_linalg_isolated.mlir -o output/04_llvm_baseline.mlir -lower-all-to-llvm=use-bare-ptr-memref-call-conv
mlir-translate output/04_llvm_baseline.mlir -o output/05_llvm_baseline.ll --mlir-to-llvmir -opaque-pointers=0
test -d ./output/bambu/baseline || mkdir -p ./output/bambu/baseline; \
cd ./output/bambu/baseline; \
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel \
--generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean \
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
== Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll
********************************************************************************
____ _
| __ ) __ _ _ __ ___ | |_ _ _
| _ \ / _` | '_ ` _ \| '_ \| | | |
| |_) | (_| | | | | | | |_) | |_| |
|____/ \__,_|_| |_| |_|_.__/ \__,_|
********************************************************************************
High-Level Synthesis Tool
Politecnico di Milano - DEIB
System Architectures Group
********************************************************************************
Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision f6bcd3bdaf988ef69272e21724bd338199baefc8-fix/minorIssues
Parameters parsed in 0.07 seconds
Target technology = FPGA
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 10
- combinational: 0
- others: 10
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_FU
Total cells : 8
- combinational: 0
- others: 8
Library Name : STD_FU
Total cells : 56
- combinational: 0
- others: 56
Library Name : STD_FU
Total cells : 1
- combinational: 0
- others: 1
Library Name : CS_COMPONENT
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 0
- combinational: 0
- others: 0
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 21
- combinational: 0
- others: 21
Library Name : STD
Total cells : 14
- combinational: 0
- others: 14
Library Name : STD_COMMON
Total cells : 57
- combinational: 0
- others: 57
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_PC
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_SOFT_FLOAT
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD
Total cells : 95
- combinational: 0
- others: 95
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 9
- combinational: 0
- others: 9
Library Name : WBWrapper
Total cells : 12
- combinational: 0
- others: 12
Available devices:
- 5CSEMA5F31C6
- 5SGXEA7N2F45C1
- EP2C70F896C6
- EP2C70F896C6-R
- EP4SGX530KH40C2
- LFE335EA8FN484C
- LFE5U85F8BG756C
- LFE5UM85F8BG756C
- asap7-BC
- asap7-TC
- asap7-WC
- nangate45
- nx1h140tsp
- nx1h35S
- nx2h540tsc
- xc4vlx100-10ff1513
- xc5vlx110t-1ff1136
- xc5vlx330t-2ff1738
- xc5vlx50-3ff1153
- xc6vlx240t-1ff1156
- xc7a100t-1csg324-VVD
- xc7vx330t-1ffg1157
- xc7vx485t-2ffg1761-VVD
- xc7vx690t-3ffg1930-VVD
- xc7z020-1clg484
- xc7z020-1clg484-VVD
- xc7z020-1clg484-YOSYS-VVD
- xc7z045-2ffg900-VVD
- xcku060-3ffva1156-VVD
- xcu280-2Lfsvh2892-VVD
Library Name : STD_FU
Total cells : 3931
- combinational: 0
- others: 3931
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.02 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.03 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Tree merging time: 0.11 seconds;
(in-process) /usr/lib/llvm-16/lib/clang/16/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: plus_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: cond_expr optimized, nbits = 3
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: cond_expr optimized, nbits = 3
Bit Value Opt: cond_expr optimized, nbits = 4
Bit Value Opt: cond_expr optimized, nbits = 5
Bit Value Opt: bit_and_expr optimized, nbits = 1
Bit Value Opt: ne_expr optimized, nbits = 1
Bit Value Opt: bit_xor_expr optimized, nbits = 2
Bit Value Opt: bit_and_expr optimized, nbits = 2
Bit Value Opt: ne_expr optimized, nbits = 2
Bit Value Opt: plus_expr optimized, nbits = 2
Bit Value Opt: bit_and_expr optimized, nbits = 11
Bit Value Opt: eq_expr optimized, nbits = 11
Bit Value Opt: bit_and_expr optimized, nbits = 19
Bit Value Opt: eq_expr optimized, nbits = 19
Bit Value Opt: bit_and_expr optimized, nbits = 23
Bit Value Opt: eq_expr optimized, nbits = 23
Bit Value Opt: bit_and_expr optimized, nbits = 25
Bit Value Opt: eq_expr optimized, nbits = 25
Bit Value Opt: bit_and_expr optimized, nbits = 26
Bit Value Opt: eq_expr optimized, nbits = 26
Bit Value Opt: bit_and_expr optimized, nbits = 26
Bit Value Opt: ne_expr optimized, nbits = 26
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: bit_and_expr optimized, nbits = 1
Bit Value Opt: ne_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: bit_and_expr optimized, nbits = 22
Bit Value Opt: ne_expr optimized, nbits = 22
Bit Value Opt: bit_and_expr optimized, nbits = 47
Bit Value Opt: ne_expr optimized, nbits = 47
Bit Value Opt: bit_and_expr optimized, nbits = 9
Bit Value Opt: ne_expr optimized, nbits = 9
Bit Value Opt: bit_and_expr optimized, nbits = 32
Bit Value Opt: ne_expr optimized, nbits = 32
Bit Value Opt: plus_expr optimized, nbits = 4
Bit Value Opt: plus_expr optimized, nbits = 4
Bit Value Opt: plus_expr optimized, nbits = 4
Functions to be synthesized:
forward_kernel
__float_mule8m23b_127nih
__float_adde8m23b_127nih
Memory allocation information:
Sparse memory alignemnt set to 1024 bytes
Warning: This function uses unknown addresses: forward_kernel
BRAM bitsize: 16
Spec may not exploit DATA bus width
Spec accesses data having an address unknown at compile time
Internal data is not externally accessible
DATA bus bitsize: 32
ADDRESS bus bitsize: 32
SIZE bus bitsize: 6
Internally allocated memory (no private memories): 0
Internally allocated memory: 0
Time to perform memory allocation: 0.00 seconds
Module allocation information for function __float_adde8m23b_127nih:
Number of complex operations: 0
Number of complex operations: 0
Time to perform module allocation: 0.05 seconds
Module allocation information for function __float_mule8m23b_127nih:
Number of complex operations: 1
Number of complex operations: 1
Time to perform module allocation: 0.02 seconds
Scheduling Information of function __float_adde8m23b_127nih:
Number of control steps: 9
Minimum slack: 0.010964990999998037
Estimated max frequency (MHz): 200.43956360218831
Time to perform scheduling: 0.03 seconds
Number of function call sites = 19
State Transition Graph Information of function __float_adde8m23b_127nih:
Number of operations: 257
Number of basic blocks: 3
Number of states: 8
Minimum number of cycles: 8
Maximum number of cycles 8
Parameters are registered
Done port is registered
Time to perform creation of STG: 0.02 seconds
Scheduling Information of function __float_mule8m23b_127nih:
Number of control steps: 8
Minimum slack: 0.056999993999998111
Estimated max frequency (MHz): 202.30629148010559
Time to perform scheduling: 0.01 seconds
Number of function call sites = 19
State Transition Graph Information of function __float_mule8m23b_127nih:
Number of operations: 104
Number of basic blocks: 3
Number of states: 7
Minimum number of cycles: 7
Maximum number of cycles 7
Parameters are registered
Done port is registered
Time to perform creation of STG: 0.01 seconds
Easy binding information for function __float_adde8m23b_127nih:
Bound operations:192/257
Time to perform easy binding: 0.00 seconds
Easy binding information for function __float_mule8m23b_127nih:
Bound operations:85/104
Time to perform easy binding: 0.00 seconds
Storage Value Information of function __float_adde8m23b_127nih:
Number of storage values inserted: 89
Time to compute storage value information: 0.00 seconds
Storage Value Information of function __float_mule8m23b_127nih:
Number of storage values inserted: 16
Time to compute storage value information: 0.00 seconds
Slack computed in 0.00 seconds
Weight computation completed in 0.00 seconds
False-loop computation completed in 0.00 seconds
Iteration 0 completed in 0.00 seconds
Register binding information for function __float_adde8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
Time to perform register binding: 0.00 seconds
Iteration 1 completed in 0.00 seconds
Clique covering computation completed in 0.00 seconds
Module binding information for function __float_adde8m23b_127nih:
Number of modules instantiated: 257
Number of performance conflicts: 13
Estimated resources area (no Muxes and address logic): 2746
Estimated area of MUX21: 0
Total estimated area: 2746
Estimated number of DSPs: 0
Time to perform module binding: 0.01 seconds
Register binding information for function __float_adde8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
Time to perform register binding: 0.01 seconds
Total number of flip-flops in function __float_adde8m23b_127nih: 488
Slack computed in 0.00 seconds
Weight computation completed in 0.00 seconds
False-loop computation completed in 0.00 seconds
Iteration 0 completed in 0.00 seconds
Register binding information for function __float_mule8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
Time to perform register binding: 0.00 seconds
Iteration 1 completed in 0.00 seconds
Clique covering computation completed in 0.00 seconds
Module binding information for function __float_mule8m23b_127nih:
Number of modules instantiated: 104
Number of performance conflicts: 0
Estimated resources area (no Muxes and address logic): 1100
Estimated area of MUX21: 0
Total estimated area: 1100
Estimated number of DSPs: 3
Time to perform module binding: 0.00 seconds
Register binding information for function __float_mule8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
Time to perform register binding: 0.00 seconds
Total number of flip-flops in function __float_mule8m23b_127nih: 197
Module allocation information for function forward_kernel:
Number of complex operations: 99
Number of complex operations: 99
Time to perform module allocation: 0.03 seconds
Scheduling Information of function forward_kernel:
Number of control steps: 353
Minimum slack: 0.010964988999987213
Estimated max frequency (MHz): 200.43956352183582
Time to perform scheduling: 0.06 seconds
Number of function call sites = 0
State Transition Graph Information of function forward_kernel:
Number of operations: 428
Number of basic blocks: 10
Number of states: 353
Done port is registered
Time to perform creation of STG: 0.20 seconds
Easy binding information for function forward_kernel:
Bound operations:243/428
Time to perform easy binding: 0.00 seconds
Storage Value Information of function forward_kernel:
Number of storage values inserted: 157
Time to compute storage value information: 0.00 seconds
Slack computed in 0.00 seconds
Weight computation completed in 0.02 seconds
False-loop computation completed in 0.00 seconds
cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
Iteration 0 completed in 0.01 seconds
Register binding information for function forward_kernel:
Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
Time to perform register binding: 0.02 seconds
cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
Iteration 1 completed in 0.02 seconds
Clique covering computation completed in 0.05 seconds
Module binding information for function forward_kernel:
Number of modules instantiated: 333
Number of performance conflicts: 147
Estimated resources area (no Muxes and address logic): 5699
Estimated area of MUX21: 1332.3333333333333
Total estimated area: 7031.333333333333
Estimated number of DSPs: 0
Time to perform module binding: 0.07 seconds
Register binding information for function forward_kernel:
Register allocation algorithm obtains a sub-optimal result: 150 registers(LB:41)
Time to perform register binding: 0.02 seconds
Connection Binding Information for function forward_kernel:
Number of allocated multiplexers (2-to-1 equivalent): 141
Total number of bit-level multiplexers: 4640
Time to perform interconnection binding: 0.01 seconds
Total number of flip-flops in function forward_kernel: 4767
C-based testbench generation for function forward_kernel: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/cosim.c
Prepared testbench
Summary of resources:
- ASSIGN_UNSIGNED_FU: 1
- BMEMORY_CTRLN: 1
- IUdata_converter_FU: 3
- MUX_GATE: 141
- OR_GATE: 2
- UIdata_converter_FU: 3
- UUdata_converter_FU: 263
- constant_value: 139
- flipflop_AR: 2
- lshift_expr_FU: 3
- lut_expr_FU: 71
- multi_read_cond_FU: 1
- read_cond_FU: 2
- register_SE: 161
- register_STD: 98
- rshift_expr_FU: 3
- ui_bit_and_expr_FU: 34
- ui_bit_ior_concat_expr_FU: 4
- ui_bit_ior_expr_FU: 39
- ui_bit_xor_expr_FU: 2
- ui_cond_expr_FU: 12
- ui_eq_expr_FU: 3
- ui_extract_bit_expr_FU: 101
- ui_lshift_expr_FU: 65
- ui_lt_expr_FU: 5
- ui_minus_expr_FU: 1
- ui_mult_expr_FU: 1
- ui_ne_expr_FU: 6
- ui_plus_expr_FU: 12
- ui_pointer_plus_expr_FU: 41
- ui_rshift_expr_FU: 26
- ui_ternary_plus_expr_FU: 1
- ui_ternary_pm_expr_FU: 1
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
make[1]: Entering directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o bambu_testbench.o /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o verilated.o /usr/share/verilator/include/verilated.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o verilated_dpi.o /usr/share/verilator/include/verilated_dpi.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench.o Vbambu_testbench.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench___024unit.o Vbambu_testbench___024unit.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench__Dpi.o Vbambu_testbench__Dpi.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench__Syms.o Vbambu_testbench__Syms.cpp
ar -cr Vbambu_testbench__ALL.a Vbambu_testbench.o Vbambu_testbench___024unit.o Vbambu_testbench__Dpi.o Vbambu_testbench__Syms.o
ranlib Vbambu_testbench__ALL.a
g++ bambu_testbench.o verilated.o verilated_dpi.o Vbambu_testbench__ALL.a -m32 -lpthread /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//verilator_beh/libtb.so -o Vbambu_testbench -lm -lstdc++
make[1]: Leaving directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
Results file: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt
Reset active: LOW
Co-sim: Co-simulation started
Co-sim: Memory size for parameter 0 set to 64 bytes.
Co-sim: Memory size for parameter 1 set to 64 bytes.
Co-sim: Memory size for parameter 2 set to 16 bytes.
Co-sim: Address 0xF72152E0 mapped at 0x40000000 (64 bytes)
Co-sim: Address 0xF72152A0 mapped at 0x40000040 (64 bytes)
Co-sim: Address 0xF7215290 mapped at 0x40000080 (16 bytes)
Co-sim: Pointer parameter 0xF72152E0 mapped at 0x40000000
Co-sim: Parameter 0 is 32 bits at 0xF7215258
Co-sim: Pointer parameter 0xF72152A0 mapped at 0x40000040
Co-sim: Parameter 1 is 32 bits at 0xF7215254
Co-sim: Pointer parameter 0xF7215290 mapped at 0x40000080
Co-sim: Parameter 2 is 32 bits at 0xF7215250
ERROR: Sim: Nearest memory space is 0x40000080->0xF7215290 to 0x40000090->0xF72152A0 (16 bytes).
ERROR: Sim: Read to non-mapped address 0x40000090.
File "/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt" opened
error -> Unable to parse simulation time report: check simulator output for errors.
void SimulationTool::DetermineCycles(unsigned long long &, unsigned long long &)
../../src/wrapper/simulation/SimulationTool.cpp:223
Please report bugs to <panda-info@polimi.it>
I was able to run the simulation but to be sure, I need forward_kernel_test.xml. From what I understood, the top function signature is different from the one used by the original tutorial.
Here is a reproducer. I get different errors when running on linux vs Intel Mac. Please let me know if anything is broken in the reproducer. Thanks!
git clone --recursive git@github.com:cmu-sei/soda-opt-docker.git
cd soda-opt-docker
docker build --rm --pull -f ./Dockerfile -t soda-opt:dev-panda .
docker run --rm -it --network=host --privileged -e DISPLAY=$DISPLAY -e UID=$(id -u) -e GID=$(id -g) -v `pwd`/env:/home/soda-opt-user/env:rw -v `pwd`/work:/home/soda-opt-user/work soda-opt:dev-panda
# in the container
cd work/pytorch-iris/
./getmakefile.sh
make synth-baseline
Hi, did you get a chance to try reproduce my errors? Thanks!
I’m working on it. It just takes longer than expected the setup.
Hi,
I've recently changed how some files are generated to manage opaque pointers.
One big change concerns the file describing the top function signature. This file is needed when the starting point is a .ll file. Opaque pointers make all pointers equivalent to void *. So, the size of the objects to which the pointers point has to be manually specified. This could be done by passing to Bambu the option '--interface-xml-filename=<filename>'
This file is automatically generated by soda-opt, and so the following line
bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll
need to be changed in
bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll --interface-xml-filename=../../forward_kernel_interface.xml
The latest option fixes the issue, but you can get more from this example.
Instead of the minimal interface, you may use the option --generate-interface=INFER that follows the same assumption adopted by Vitis HLS (see these pragmas). In this last case, Bambu can infer the interface to connect the three parameters to three different BRAMs. Since the bus is no longer a constraint, you are going to half the number of cycles. Since the array protocol requires to know exactly how large is the BRAM attached and since from .ll files it is impossible to specify the size of the array and the size of the base elements (at least with opaque pointers), I've recently extended the forward_kernel_interface.xml file by adding a new attribute to the parameters.
The newer version for your example is:
<?xml version="1.0"?>
<module>
<function id="forward_kernel">
<arg id="P0" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
<arg id="P1" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
<arg id="P2" SizeInBytes="64" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="16" interface_typename_include=""/>
</function>
SizeInBytes
allows Bambu to understand the memory layout of the function parameters. This new parameter will soon be added by @agostini01 to the soda-opt infrastructure.
So, if you now run
bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll --interface-xml-filename=../../forward_kernel_interface.xml --generate-interface=INFER
You should obtain a core taking 2266 cycles to complete.
Concerning the --memory-mapped-top
option, it needs to be fixed. We are working on it, but fixing it may take some time.
Hi,
I just fixed the issue preventing testbench generation for memory-mapped kernels with the latest dev/panda branch commits. Now, you should be able to use the --memory-mapped-top, --generate-tb, and --simulate options to let Bambu generate a proper testbench environment and run the simulation.
Great, I'll give it a try!
I still get an error with the reproducer, which is not using --memory-mapped-top
.
bambu \
-v3 --print-dot \
-lm --soft-float \
--compiler=I386_CLANG16 \
--device=xc7z020-1clg484-VVD \
--clock-period=5 \
--experimental-setup=BAMBU-BALANCED-MP \
--channels-number=2 \
--memory-allocation-policy=ALL_BRAM \
--disable-function-proxy \
--top-fname=forward_kernel \
--generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean \
../../../output/05_llvm_baseline.ll 2>&1 | tee ../../bambu-baseline-synth-log
== Bambu executed with: bambu -v3 --print-dot -lm --soft-float --compiler=I386_CLANG16 --device=xc7z020-1clg484-VVD --clock-period=5 --experimental-setup=BAMBU-BALANCED-MP --channels-number=2 --memory-allocation-policy=ALL_BRAM --disable-function-proxy --top-fname=forward_kernel --generate-tb=../../forward_kernel_test.xml --simulate --simulator=VERILATOR --no-clean ../../../output/05_llvm_baseline.ll
********************************************************************************
____ _
| __ ) __ _ _ __ ___ | |_ _ _
| _ \ / _` | '_ ` _ \| '_ \| | | |
| |_) | (_| | | | | | | |_) | |_| |
|____/ \__,_|_| |_| |_|_.__/ \__,_|
********************************************************************************
High-Level Synthesis Tool
Politecnico di Milano - DEIB
System Architectures Group
********************************************************************************
Copyright (C) 2004-2023 Politecnico di Milano
Version: PandA 2023.08 - Revision d0cb0caebaf1cc24e6fc6eb235156bc55fe21318-dev/panda
Parameters parsed in 0.10 seconds
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 10
- combinational: 0
- others: 10
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_FU
Total cells : 8
- combinational: 0
- others: 8
Library Name : STD_FU
Total cells : 56
- combinational: 0
- others: 56
Library Name : STD_FU
Total cells : 1
- combinational: 0
- others: 1
Library Name : CS_COMPONENT
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 0
- combinational: 0
- others: 0
Library Name : STD_FU
Total cells : 3
- combinational: 0
- others: 3
Library Name : STD_FU
Total cells : 21
- combinational: 0
- others: 21
Library Name : STD
Total cells : 14
- combinational: 0
- others: 14
Library Name : STD_COMMON
Total cells : 57
- combinational: 0
- others: 57
Library Name : STD_FU
Total cells : 33
- combinational: 0
- others: 33
Library Name : STD_PC
Total cells : 16
- combinational: 0
- others: 16
Library Name : STD_SOFT_FLOAT
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD
Total cells : 95
- combinational: 0
- others: 95
Library Name : STD_FU
Total cells : 2
- combinational: 0
- others: 2
Library Name : STD_FU
Total cells : 9
- combinational: 0
- others: 9
Library Name : WBWrapper
Total cells : 12
- combinational: 0
- others: 12
Available devices:
- 5CSEMA5F31C6
- 5SGXEA7N2F45C1
- EP2C70F896C6
- EP2C70F896C6-R
- EP4SGX530KH40C2
- LFE335EA8FN484C
- LFE5U85F8BG756C
- LFE5UM85F8BG756C
- asap7-BC
- asap7-TC
- asap7-WC
- nangate45
- nx1h140tsp
- nx1h35S
- nx2h540tsc
- xc4vlx100-10ff1513
- xc5vlx110t-1ff1136
- xc5vlx330t-2ff1738
- xc5vlx50-3ff1153
- xc6vlx240t-1ff1156
- xc7a100t-1csg324-VVD
- xc7vx330t-1ffg1157
- xc7vx485t-2ffg1761-VVD
- xc7vx690t-3ffg1930-VVD
- xc7z020-1clg484
- xc7z020-1clg484-VVD
- xc7z020-1clg484-YOSYS-VVD
- xc7z045-2ffg900-VVD
- xcku060-3ffva1156-VVD
- xcu280-2Lfsvh2892-VVD
Library Name : STD_FU
Total cells : 3931
- combinational: 0
- others: 3931
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.02 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Compilation time: 0.03 seconds;
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
Tree merging time: 0.18 seconds;
(in-process) /usr/lib/llvm-16/lib/clang/16/include /usr/local/include /usr/include/x86_64-linux-gnu /usr/include
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: plus_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: cond_expr optimized, nbits = 3
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 2
Bit Value Opt: cond_expr optimized, nbits = 3
Bit Value Opt: cond_expr optimized, nbits = 4
Bit Value Opt: cond_expr optimized, nbits = 5
Bit Value Opt: bit_and_expr optimized, nbits = 1
Bit Value Opt: ne_expr optimized, nbits = 1
Bit Value Opt: bit_xor_expr optimized, nbits = 2
Bit Value Opt: bit_and_expr optimized, nbits = 2
Bit Value Opt: ne_expr optimized, nbits = 2
Bit Value Opt: plus_expr optimized, nbits = 2
Bit Value Opt: bit_and_expr optimized, nbits = 11
Bit Value Opt: eq_expr optimized, nbits = 11
Bit Value Opt: bit_and_expr optimized, nbits = 19
Bit Value Opt: eq_expr optimized, nbits = 19
Bit Value Opt: bit_and_expr optimized, nbits = 23
Bit Value Opt: eq_expr optimized, nbits = 23
Bit Value Opt: bit_and_expr optimized, nbits = 25
Bit Value Opt: eq_expr optimized, nbits = 25
Bit Value Opt: bit_and_expr optimized, nbits = 26
Bit Value Opt: eq_expr optimized, nbits = 26
Bit Value Opt: bit_and_expr optimized, nbits = 26
Bit Value Opt: ne_expr optimized, nbits = 26
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: bit_and_expr optimized, nbits = 1
Bit Value Opt: ne_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: cond_expr optimized, nbits = 1
Bit Value Opt: bit_and_expr optimized, nbits = 22
Bit Value Opt: ne_expr optimized, nbits = 22
Bit Value Opt: bit_and_expr optimized, nbits = 47
Bit Value Opt: ne_expr optimized, nbits = 47
Bit Value Opt: bit_and_expr optimized, nbits = 9
Bit Value Opt: ne_expr optimized, nbits = 9
Bit Value Opt: bit_and_expr optimized, nbits = 32
Bit Value Opt: ne_expr optimized, nbits = 32
Bit Value Opt: plus_expr optimized, nbits = 4
Bit Value Opt: plus_expr optimized, nbits = 4
Bit Value Opt: plus_expr optimized, nbits = 4
Functions to be synthesized:
forward_kernel
__float_mule8m23b_127nih
__float_adde8m23b_127nih
Memory allocation information:
Sparse memory alignemnt set to 1024 bytes
Warning: This function uses unknown addresses: forward_kernel
BRAM bitsize: 16
Spec may not exploit DATA bus width
Spec accesses data having an address unknown at compile time
Internal data is not externally accessible
DATA bus bitsize: 32
ADDRESS bus bitsize: 32
SIZE bus bitsize: 6
Internally allocated memory (no private memories): 0
Internally allocated memory: 0
Time to perform memory allocation: 0.00 seconds
Module allocation information for function __float_adde8m23b_127nih:
Number of complex operations: 0
Number of complex operations: 0
Time to perform module allocation: 0.09 seconds
Module allocation information for function __float_mule8m23b_127nih:
Number of complex operations: 1
Number of complex operations: 1
Time to perform module allocation: 0.03 seconds
Scheduling Information of function __float_adde8m23b_127nih:
Number of control steps: 9
Minimum slack: 0.010964990999998037
Estimated max frequency (MHz): 200.43956360218831
Time to perform scheduling: 0.05 seconds
Number of function call sites = 19
State Transition Graph Information of function __float_adde8m23b_127nih:
Number of operations: 257
Number of basic blocks: 3
Number of states: 8
Minimum number of cycles: 8
Maximum number of cycles 8
Parameters are registered
Done port is registered
Time to perform creation of STG: 0.03 seconds
Scheduling Information of function __float_mule8m23b_127nih:
Number of control steps: 8
Minimum slack: 0.056999993999998111
Estimated max frequency (MHz): 202.30629148010559
Time to perform scheduling: 0.02 seconds
Number of function call sites = 19
State Transition Graph Information of function __float_mule8m23b_127nih:
Number of operations: 104
Number of basic blocks: 3
Number of states: 7
Minimum number of cycles: 7
Maximum number of cycles 7
Parameters are registered
Done port is registered
Time to perform creation of STG: 0.02 seconds
Easy binding information for function __float_adde8m23b_127nih:
Bound operations:192/257
Time to perform easy binding: 0.00 seconds
Easy binding information for function __float_mule8m23b_127nih:
Bound operations:85/104
Time to perform easy binding: 0.00 seconds
Storage Value Information of function __float_adde8m23b_127nih:
Number of storage values inserted: 89
Time to compute storage value information: 0.00 seconds
Storage Value Information of function __float_mule8m23b_127nih:
Number of storage values inserted: 16
Time to compute storage value information: 0.00 seconds
Slack computed in 0.00 seconds
Weight computation completed in 0.00 seconds
False-loop computation completed in 0.00 seconds
Iteration 0 completed in 0.00 seconds
Register binding information for function __float_adde8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
Time to perform register binding: 0.00 seconds
Iteration 1 completed in 0.00 seconds
Clique covering computation completed in 0.00 seconds
Module binding information for function __float_adde8m23b_127nih:
Number of modules instantiated: 257
Number of performance conflicts: 13
Estimated resources area (no Muxes and address logic): 2745
Estimated area of MUX21: 0
Total estimated area: 2745
Estimated number of DSPs: 0
Time to perform module binding: 0.01 seconds
Register binding information for function __float_adde8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 89 registers(LB:51)
Time to perform register binding: 0.01 seconds
Total number of flip-flops in function __float_adde8m23b_127nih: 488
Slack computed in 0.00 seconds
Weight computation completed in 0.00 seconds
False-loop computation completed in 0.00 seconds
Iteration 0 completed in 0.00 seconds
Register binding information for function __float_mule8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
Time to perform register binding: 0.00 seconds
Iteration 1 completed in 0.00 seconds
Clique covering computation completed in 0.00 seconds
Module binding information for function __float_mule8m23b_127nih:
Number of modules instantiated: 104
Number of performance conflicts: 0
Estimated resources area (no Muxes and address logic): 1100
Estimated area of MUX21: 0
Total estimated area: 1100
Estimated number of DSPs: 3
Time to perform module binding: 0.00 seconds
Register binding information for function __float_mule8m23b_127nih:
Register allocation algorithm obtains a sub-optimal result: 16 registers(LB:9)
Time to perform register binding: 0.00 seconds
Total number of flip-flops in function __float_mule8m23b_127nih: 197
Module allocation information for function forward_kernel:
Number of complex operations: 99
Number of complex operations: 99
Time to perform module allocation: 0.06 seconds
Scheduling Information of function forward_kernel:
Number of control steps: 353
Minimum slack: 0.14839999500031809
Estimated max frequency (MHz): 206.11756924921215
Time to perform scheduling: 0.11 seconds
Number of function call sites = 0
State Transition Graph Information of function forward_kernel:
Number of operations: 428
Number of basic blocks: 10
Number of states: 353
Done port is registered
Time to perform creation of STG: 0.45 seconds
Easy binding information for function forward_kernel:
Bound operations:243/428
Time to perform easy binding: 0.00 seconds
Storage Value Information of function forward_kernel:
Number of storage values inserted: 156
Time to compute storage value information: 0.00 seconds
Slack computed in 0.00 seconds
Weight computation completed in 0.03 seconds
False-loop computation completed in 0.00 seconds
cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
Iteration 0 completed in 0.03 seconds
Register binding information for function forward_kernel:
Register allocation algorithm obtains a sub-optimal result: 149 registers(LB:41)
Time to perform register binding: 0.04 seconds
cdfc mux estimation 61 -- Number of cliques covering the graph: 2 forward_kernel_BMEMORY_CTRLN_212 with 61 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_adde8m23b_127nih_257 with 19 vertices
cdfc mux estimation 36 -- Number of cliques covering the graph: 1 forward_kernel___float_mule8m23b_127nih_258 with 19 vertices
Iteration 1 completed in 0.03 seconds
Clique covering computation completed in 0.10 seconds
Module binding information for function forward_kernel:
Number of modules instantiated: 333
Number of performance conflicts: 147
Estimated resources area (no Muxes and address logic): 5694
Estimated area of MUX21: 1332.3333333333333
Total estimated area: 7026.333333333333
Estimated number of DSPs: 0
Time to perform module binding: 0.14 seconds
Register binding information for function forward_kernel:
Register allocation algorithm obtains a sub-optimal result: 149 registers(LB:41)
Time to perform register binding: 0.03 seconds
Connection Binding Information for function forward_kernel:
Number of allocated multiplexers (2-to-1 equivalent): 140
Total number of bit-level multiplexers: 4608
Time to perform interconnection binding: 0.01 seconds
Total number of flip-flops in function forward_kernel: 4735
C-based testbench generation for function forward_kernel: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/cosim.c
Prepared testbench
Summary of resources:
- ASSIGN_UNSIGNED_FU: 1
- BMEMORY_CTRLN: 1
- IUdata_converter_FU: 3
- MUX_GATE: 140
- OR_GATE: 2
- UIdata_converter_FU: 3
- UUdata_converter_FU: 263
- constant_value: 139
- flipflop_AR: 2
- lshift_expr_FU: 3
- lut_expr_FU: 71
- multi_read_cond_FU: 1
- read_cond_FU: 2
- register_SE: 160
- register_STD: 98
- rshift_expr_FU: 3
- ui_bit_and_expr_FU: 34
- ui_bit_ior_concat_expr_FU: 4
- ui_bit_ior_expr_FU: 39
- ui_bit_xor_expr_FU: 2
- ui_cond_expr_FU: 12
- ui_eq_expr_FU: 3
- ui_extract_bit_expr_FU: 101
- ui_lshift_expr_FU: 65
- ui_lt_expr_FU: 5
- ui_minus_expr_FU: 1
- ui_mult_expr_FU: 1
- ui_ne_expr_FU: 6
- ui_plus_expr_FU: 12
- ui_pointer_plus_expr_FU: 41
- ui_rshift_expr_FU: 26
- ui_ternary_plus_expr_FU: 1
- ui_ternary_pm_expr_FU: 1
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
make[1]: Entering directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o bambu_testbench.o /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o verilated.o /usr/share/verilator/include/verilated.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o verilated_dpi.o /usr/share/verilator/include/verilated_dpi.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench.o Vbambu_testbench.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench___024unit.o Vbambu_testbench___024unit.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench__Dpi.o Vbambu_testbench__Dpi.cpp
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -fstrict-aliasing -m32 -c -o Vbambu_testbench__Syms.o Vbambu_testbench__Syms.cpp
ar -cr Vbambu_testbench__ALL.a Vbambu_testbench.o Vbambu_testbench___024unit.o Vbambu_testbench__Dpi.o Vbambu_testbench__Syms.o
ranlib Vbambu_testbench__ALL.a
g++ bambu_testbench.o verilated.o verilated_dpi.o Vbambu_testbench__ALL.a -m32 -lpthread /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//verilator_beh/libtb.so -o Vbambu_testbench -lm -lstdc++
make[1]: Leaving directory '/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/HLS_output/verilator_beh/verilator_obj'
Results file: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt
Reset active: LOW
Co-sim: Co-simulation started
Co-sim: Memory size for parameter 0 set to 64 bytes.
Co-sim: Memory size for parameter 1 set to 64 bytes.
Co-sim: Memory size for parameter 2 set to 16 bytes.
Co-sim: Address 0xF71882E0 mapped at 0x40000000 (64 bytes)
Co-sim: Address 0xF71882A0 mapped at 0x40000040 (64 bytes)
Co-sim: Address 0xF7188290 mapped at 0x40000080 (16 bytes)
Co-sim: Pointer parameter 0xF71882E0 mapped at 0x40000000
Co-sim: Parameter 0 is 32 bits at 0xF7188258
Co-sim: Pointer parameter 0xF71882A0 mapped at 0x40000040
Co-sim: Parameter 1 is 32 bits at 0xF7188254
Co-sim: Pointer parameter 0xF7188290 mapped at 0x40000080
Co-sim: Parameter 2 is 32 bits at 0xF7188250
ERROR: Sim: Nearest memory space is 0x40000080->0xF7188290 to 0x40000090->0xF71882A0 (16 bytes).
ERROR: Sim: Read to non-mapped address 0x40000090.
File "/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/results.txt" opened
error -> Unable to parse simulation time report: check simulator output for errors.
void SimulationTool::DetermineCycles(unsigned long long &, unsigned long long &)
../../src/wrapper/simulation/SimulationTool.cpp:222
Please report bugs to <panda-info@polimi.it>
That error means the kernel is trying to access a memory area not allocated on the accelerator. The error lines are intended to be similar to Valgrind output if you are familiar with that. The first error line, ERROR: Sim: Nearest memory space is 0x40000080->0xF7188290 to 0x40000090->0xF71882A0 (16 bytes).
tries to report information about the surrounding memory space, which seems to be related to the third parameter in this case. The second error line says information about the illegal memory operation ERROR: Sim: Read to non-mapped address 0x40000090.
which seems to be a read right after the third parameter memory space ends.
This may be related to an error in the computation or, most likely, to a wrong testbench memory initialization. Just as a quick check, you may have a look at the beginning of the simulation log and verify each parameter size is as expected.
Co-sim: Memory size for parameter 0 set to 64 bytes.
Co-sim: Memory size for parameter 1 set to 64 bytes.
Co-sim: Memory size for parameter 2 set to 16 bytes.
Also, it may be useful to write a C/C++ testbench to check the kernel functionality before the synthesis. As a starting point, you may use the generated testbench, which you can find in HLS_output/simulation/cosim.c
: you may copy just the main function implementation in a separate C file, compile that along with 05_llvm_baseline.ll, and check the executable is running fine (maybe with Valgrind too).
If you can share both 05_llvm_baseline.ll and forward_kernel_test.xml, I can help you with that.
You need to pass --interface-xml-filename=../../forward_kernel_interface.xml with forward_kernel_interface.xml having this content:
<?xml version="1.0"?>
<module>
<function id="forward_kernel">
<arg id="P0" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
<arg id="P1" SizeInBytes="256" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="64" interface_typename_include=""/>
<arg id="P2" SizeInBytes="64" interface_type="array" interface_typename="float*" interface_typename_orig="float (*)" size="16" interface_typename_include=""/>
</function>
It works! Thanks!
I changed to another neural network and run into new issues with the --memory-mapped-top
option. If you would like to reproduce you can just update the submodule from the reproducer steps above.
This is the error.
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:70: Duplicate declaration of module: 'join_signal'
module join_signal(in1,
^~~~~~~~~~~
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18397: ... Location of original declaration
module join_signal(in1,
^~~~~~~~~~~
... Use "/* verilator lint_off MODDUP */" and lint_on around source to disable this message.
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:93: Duplicate declaration of module: 'split_signal'
module split_signal(in1,
^~~~~~~~~~~~
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18420: ... Location of original declaration
module split_signal(in1,
^~~~~~~~~~~~
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:1080: Duplicate declaration of module: 'bus_merger'
module bus_merger(in1,
^~~~~~~~~~
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18366: ... Location of original declaration
module bus_merger(in1,
^~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Expecting expression to be constant, but variable isn't const: 'MEM_var_394383_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Expecting expression to be constant, but variable isn't const: 'MEM_var_394386_393256'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394386_393256(MEM_var_394386_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Expecting expression to be constant, but variable isn't const: 'MEM_var_394391_393256'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394391_393256(MEM_var_394391_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Expecting expression to be constant, but variable isn't const: 'MEM_var_395284_393256'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_395284_393256(MEM_var_395284_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Expecting expression to be constant, but variable isn't const: 'MEM_var_439985_403892'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_439985_403892(MEM_var_439985_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Expecting expression to be constant, but variable isn't const: 'MEM_var_440165_403892'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440165_403892(MEM_var_440165_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Expecting expression to be constant, but variable isn't const: 'MEM_var_440251_403892'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440251_403892(MEM_var_440251_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Expecting expression to be constant, but variable isn't const: 'MEM_var_495234_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_495234_495177(MEM_var_495234_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Expecting expression to be constant, but variable isn't const: 'MEM_var_496077_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496077_495177(MEM_var_496077_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Expecting expression to be constant, but variable isn't const: 'MEM_var_496299_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Can't convert defparam value to constant: Param 'MEM_var_394383_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Can't convert defparam value to constant: Param 'MEM_var_394386_393256' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394386_393256(MEM_var_394386_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Can't convert defparam value to constant: Param 'MEM_var_394391_393256' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394391_393256(MEM_var_394391_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Can't convert defparam value to constant: Param 'MEM_var_395284_393256' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_395284_393256(MEM_var_395284_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Can't convert defparam value to constant: Param 'MEM_var_439985_403892' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_439985_403892(MEM_var_439985_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Can't convert defparam value to constant: Param 'MEM_var_440165_403892' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440165_403892(MEM_var_440165_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Can't convert defparam value to constant: Param 'MEM_var_440251_403892' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440251_403892(MEM_var_440251_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Can't convert defparam value to constant: Param 'MEM_var_495234_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_495234_495177(MEM_var_495234_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Can't convert defparam value to constant: Param 'MEM_var_496077_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496077_495177(MEM_var_496077_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Can't convert defparam value to constant: Param 'MEM_var_496299_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
^~~~~~~~~~~~~~~~~~~~~
%Error: Exiting due to 20 error(s)
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
clang: warning: argument unused during compilation: '-I /usr/bin/../share/verilator/include/vltstd' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I /opt/panda/include' [-Wunused-command-line-argument]
warning: overriding the module target triple with i386-pc-linux-gnu [-Woverride-module]
1 warning generated.
clang: warning: -lm: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: optimization flag '-ffloat-store' is not supported [-Wignored-optimization-argument]
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:70: Duplicate declaration of module: 'join_signal'
module join_signal(in1,
^~~~~~~~~~~
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18397: ... Location of original declaration
module join_signal(in1,
^~~~~~~~~~~
... Use "/* verilator lint_off MODDUP */" and lint_on around source to disable this message.
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:93: Duplicate declaration of module: 'split_signal'
module split_signal(in1,
^~~~~~~~~~~~
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18420: ... Location of original declaration
module split_signal(in1,
^~~~~~~~~~~~
%Warning-MODDUP: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/./HLS_output//simulation/bambu_testbench.v:1080: Duplicate declaration of module: 'bus_merger'
module bus_merger(in1,
^~~~~~~~~~
/home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:18366: ... Location of original declaration
module bus_merger(in1,
^~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Expecting expression to be constant, but variable isn't const: 'MEM_var_394383_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Expecting expression to be constant, but variable isn't const: 'MEM_var_394386_393256'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394386_393256(MEM_var_394386_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Expecting expression to be constant, but variable isn't const: 'MEM_var_394391_393256'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394391_393256(MEM_var_394391_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Expecting expression to be constant, but variable isn't const: 'MEM_var_395284_393256'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_395284_393256(MEM_var_395284_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Expecting expression to be constant, but variable isn't const: 'MEM_var_439985_403892'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_439985_403892(MEM_var_439985_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Expecting expression to be constant, but variable isn't const: 'MEM_var_440165_403892'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440165_403892(MEM_var_440165_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Expecting expression to be constant, but variable isn't const: 'MEM_var_440251_403892'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440251_403892(MEM_var_440251_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Expecting expression to be constant, but variable isn't const: 'MEM_var_495234_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_495234_495177(MEM_var_495234_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Expecting expression to be constant, but variable isn't const: 'MEM_var_496077_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496077_495177(MEM_var_496077_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Expecting expression to be constant, but variable isn't const: 'MEM_var_496299_495177'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43182: Can't convert defparam value to constant: Param 'MEM_var_394383_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
datapath_forward_kernel #(.MEM_var_394383_495177(MEM_var_394383_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43183: Can't convert defparam value to constant: Param 'MEM_var_394386_393256' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394386_393256(MEM_var_394386_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43184: Can't convert defparam value to constant: Param 'MEM_var_394391_393256' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_394391_393256(MEM_var_394391_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43185: Can't convert defparam value to constant: Param 'MEM_var_395284_393256' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_395284_393256(MEM_var_395284_393256),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43186: Can't convert defparam value to constant: Param 'MEM_var_439985_403892' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_439985_403892(MEM_var_439985_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43187: Can't convert defparam value to constant: Param 'MEM_var_440165_403892' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440165_403892(MEM_var_440165_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43188: Can't convert defparam value to constant: Param 'MEM_var_440251_403892' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_440251_403892(MEM_var_440251_403892),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43189: Can't convert defparam value to constant: Param 'MEM_var_495234_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_495234_495177(MEM_var_495234_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43190: Can't convert defparam value to constant: Param 'MEM_var_496077_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496077_495177(MEM_var_496077_495177),
^~~~~~~~~~~~~~~~~~~~~
%Error: /home/soda-opt-user/work/pytorch-iris/output/bambu/baseline/forward_kernel.v:43191: Can't convert defparam value to constant: Param 'MEM_var_496299_495177' of 'Datapath_i'
: ... In instance bambu_testbench.system.DUT.top._forward_kernel_i0._forward_kernel_int_i0
.MEM_var_496299_495177(MEM_var_496299_495177)) Datapath_i (.Mout_oe_ram(Mout_oe_ram),
^~~~~~~~~~~~~~~~~~~~~
%Error: Exiting due to 20 error(s)
error -> Returned error code!
int ToolManager::execute_command(const std::string &, const std::string &, const std::string &, bool, bool)
../../src/wrapper/ToolManager.cpp:94
Please report bugs to <panda-info@polimi.it>