ferrandi / PandA-bambu

PandA-bambu public repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paired memory requests with the same addresses

askamkin opened this issue · comments

Does it make any sense that Bambu sends paired requests to memory with the same addresses?

bambu-equal-addresses

It is also unclear for what reason the width of Mout_data_ram_size is 14 (not 12 = 2*6), while the width of M_Rdata_ram is 128 (2*64).

Is the start_port not connected or it is just yellow-colored?

Anyway, it looks strange that Mout_adr_ram is holding a pair of zero address. On which object the load is working? Pointer based object passed to the design or a variable allocated inside the design?
Concerning the Mout_data_ram_size, the data bus is 64bit wide per two parallel channels so the possible size that a load can put on a bus is 64. 64 in binary requires 7 bits to be encoded.

Is the start_port not connected or it is just yellow-colored?

Just yellow-colored.

Anyway, it looks strange that Mout_adr_ram is holding a pair of zero address. On which object the load is working?

The start address is zero (the first green line). So, it's OK.

Pointer based object passed to the design or a variable allocated inside the design?

Pointer-based object from outside.

64 in binary requires 7 bits to be encoded.

OK, but it looks to be redundant.

A zero pointer may create some problems with the C code so pay attention on the compiled code.

Anyway, you may see what is the optimization end results by passing --print-dot and checking the HLS_output/dot/ directory. The fsm.dot/HLS_STGraph.dot files show how the operations are scheduled. The syntax used to describe the operations is based on C language.
About the redundancy, it depends on the code. Two loads performed in parallel reading the same location may happen. Much depend on what code you are synthesizing. Again checking what is done in such state would help.
In these cases, I passed options --fsm-encoding=binary --print-dot, and then with gtkwave I look in which state the controller is, and once I know the state I look to the HLS_STGraph.dot. In this way, it is easy to understant which instruction is flying.
The binary encoding makes straighforward to understand the relation between the present_state value and the state in which the controller is.

If it helps, I can send the source code (this is a student work, 8x8-block processing in JPEG).

About the redundancy, it depends on the code. Two loads performed in parallel reading the same location may happen.

Sure. I just meant that 6 bits are enough (size=0 is not used).

If it helps, I can send the source code (this is a student work, 8x8-block processing in JPEG).

yes, it would help. Please share even the options passed to bambu.

Sure. I just meant that 6 bits are enough (size=0 is not used).

When was designed the simple interface we go for a one-hot encoding for the size.

bambu --top-fname compress --device-name=xc7z020-1clg484-YOSYS-VVD \
--experimental-setup=BAMBU-PERFORMANCE-MP jpegcompress.c

jpegcompress.c.zip

Dear Alexander,
I tried out the example you uploaded with the latest version of bambu (which you can find in branch panda-0.9.7-dev) and it seems to me that the generated design works correctly with respect to the C implementation. I just had some problems synthesizing at the standard clock period of 10ns with YOSYS, because BRAMs are too slow for that frequency, but setting a higher period leads to a fully functional design.
I encourage you to try the latest bambu version if you are still interested and if you want to try different memory configuration you can play with the predefined experimental setups or with the --channels-number and --channels-type options. You can find all the necessary information in the bambu help section which you can print by passing --help to bambu executable.

Dear Michele,

Thanks for your reply. I will try.