Paired memory requests with the same addresses

Question

Paired memory requests with the same addresses

askamkin opened this issue 4 years ago · comments

Does it make any sense that Bambu sends paired requests to memory with the same addresses?

It is also unclear for what reason the width of Mout_data_ram_size is 14 (not 12 = 2*6), while the width of M_Rdata_ram is 128 (2*64).

Fabrizio Ferrandi · Answer 1 · Fri Nov 06 2020 21:48:47 GMT+0800 (China Standard Time)

Is the start_port not connected or it is just yellow-colored?

Fabrizio Ferrandi · Answer 2 · Fri Nov 06 2020 21:57:09 GMT+0800 (China Standard Time)

Anyway, it looks strange that Mout_adr_ram is holding a pair of zero address. On which object the load is working? Pointer based object passed to the design or a variable allocated inside the design?
Concerning the Mout_data_ram_size, the data bus is 64bit wide per two parallel channels so the possible size that a load can put on a bus is 64. 64 in binary requires 7 bits to be encoded.

Alexander Kamkin · Answer 3 · Fri Nov 06 2020 21:59:14 GMT+0800 (China Standard Time)

Is the start_port not connected or it is just yellow-colored?

Just yellow-colored.

Anyway, it looks strange that Mout_adr_ram is holding a pair of zero address. On which object the load is working?

The start address is zero (the first green line). So, it's OK.

Pointer based object passed to the design or a variable allocated inside the design?

Pointer-based object from outside.

Alexander Kamkin · Answer 4 · Fri Nov 06 2020 22:03:16 GMT+0800 (China Standard Time)

64 in binary requires 7 bits to be encoded.

OK, but it looks to be redundant.

Fabrizio Ferrandi · Answer 5 · Fri Nov 06 2020 22:14:43 GMT+0800 (China Standard Time)

A zero pointer may create some problems with the C code so pay attention on the compiled code.

Anyway, you may see what is the optimization end results by passing --print-dot and checking the HLS_output/dot/ directory. The fsm.dot/HLS_STGraph.dot files show how the operations are scheduled. The syntax used to describe the operations is based on C language.
About the redundancy, it depends on the code. Two loads performed in parallel reading the same location may happen. Much depend on what code you are synthesizing. Again checking what is done in such state would help.
In these cases, I passed options --fsm-encoding=binary --print-dot, and then with gtkwave I look in which state the controller is, and once I know the state I look to the HLS_STGraph.dot. In this way, it is easy to understant which instruction is flying.
The binary encoding makes straighforward to understand the relation between the present_state value and the state in which the controller is.

Alexander Kamkin · Answer 6 · Fri Nov 06 2020 22:24:21 GMT+0800 (China Standard Time)

If it helps, I can send the source code (this is a student work, 8x8-block processing in JPEG).

Alexander Kamkin · Answer 7 · Fri Nov 06 2020 22:28:43 GMT+0800 (China Standard Time)

About the redundancy, it depends on the code. Two loads performed in parallel reading the same location may happen.

Sure. I just meant that 6 bits are enough (size=0 is not used).

Fabrizio Ferrandi · Answer 8 · Sat Nov 07 2020 02:12:26 GMT+0800 (China Standard Time)

If it helps, I can send the source code (this is a student work, 8x8-block processing in JPEG).

yes, it would help. Please share even the options passed to bambu.

Sure. I just meant that 6 bits are enough (size=0 is not used).

When was designed the simple interface we go for a one-hot encoding for the size.

Alexander Kamkin · Answer 9 · Sat Nov 07 2020 02:36:25 GMT+0800 (China Standard Time)

bambu --top-fname compress --device-name=xc7z020-1clg484-YOSYS-VVD \
--experimental-setup=BAMBU-PERFORMANCE-MP jpegcompress.c

jpegcompress.c.zip

Michele Fiorito · Answer 10 · Sat Nov 13 2021 03:24:38 GMT+0800 (China Standard Time)

Dear Alexander,
I tried out the example you uploaded with the latest version of bambu (which you can find in branch panda-0.9.7-dev) and it seems to me that the generated design works correctly with respect to the C implementation. I just had some problems synthesizing at the standard clock period of 10ns with YOSYS, because BRAMs are too slow for that frequency, but setting a higher period leads to a fully functional design.
I encourage you to try the latest bambu version if you are still interested and if you want to try different memory configuration you can play with the predefined experimental setups or with the --channels-number and --channels-type options. You can find all the necessary information in the bambu help section which you can print by passing --help to bambu executable.

Alexander Kamkin · Answer 11 · Sat Nov 13 2021 15:02:46 GMT+0800 (China Standard Time)

Dear Michele,

Thanks for your reply. I will try.