ferrandi / PandA-bambu

PandA-bambu public repository

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could you provide a device support list

thechosenone98 opened this issue · comments

As far as I can see, you do not provide any list of compatible device and that makes it very hard to know what is acceptable in the command line tool for the --device-name field

If I'm completely wrong please tell me and point me to the list because I can't find it anywhere.

Thank you :)

This is totally true, we need to improve this. Here is the list of supported devices:

  • 5CSEMA5F31C6
  • 5SGXEA7N2F45C1
  • EP2C70F896C6
  • EP2C70F896C6-R
  • EP4SGX530KH40C2
  • LFE335EA8FN484C
  • LFE5U85F8BG756C
  • LFE5UM85F8BG756C
  • asap7-BC
  • asap7-TC
  • asap7-WC
  • nangate45
  • nx1h140tsp
  • nx1h35S
  • nx2h540tsc
  • xc4vlx100-10ff1513
  • xc5vlx110t-1ff1136
  • xc5vlx330t-2ff1738
  • xc5vlx50-3ff1153
  • xc6vlx240t-1ff1156
  • xc7a100t-1csg324-VVD
  • xc7vx330t-1ffg1157
  • xc7vx485t-2ffg1761-VVD
  • xc7vx690t-3ffg1930-VVD
  • xc7z020-1clg484
  • xc7z020-1clg484-VVD
  • xc7z020-1clg484-YOSYS-VVD
  • xc7z045-2ffg900-VVD

You can also find them under etc/devices subdivided by vendor. XML file names correspond to the device names.
A further note which may not be trivial is the difference between xc7z020-1clg484 and xc7z020-1clg484-VVD is the target synthesis tool which is Xilinx ISE for xc7z020-1clg484 and Xilinx Vivado for xc7z020-1clg484-VVD. Of course, this is true for Xilinx devices only. Other devices will target their specific vendor tool.

Alright so I see that you support LFE5U85F8BG756C. Could you tell me if it's possible to had it's smaller sibling the LFE5U85F6BG381C? And if so how, I'm new to this tool and need stuff to be done rather quickly :)

To quickly add support for a new board you can have a look at the example under examples/add_device_simple. Actually, it is just a matter of duplicating the device characterization and -seed.xml and recompiling the framework. You may get worse scheduling though since device data will be from the copied device. Anyhow, if you would like a quick and dirty approach you may just set the already supported device as the target (--device-name=LFE5U85F8BG756C) and edit the generated synthesis script replacing the device with the one you need.

Hum, I don't see anything in the synthesis script relating to the board actually but I do know that when I run it, it prints LFE5U85F8BG756C at some point in the output and it finishes correctly.

#!/bin/bash
##########################################################
#     Automatically generated by the PandA framework     #
##########################################################

# Synthesis script for COMPONENT: icrc1

#configuration
export TEMP=/tmp;export LSC_INI_PATH="";export LSC_DIAMOND=true;export TCL_LIBRARY=/usr/local/diamond/3.12/tcltk/lib/tcl8.5;export FOUNDRY=/usr/local/diamond/3.12/ispfpga;export PATH=$FOUNDRY/bin/lin64:/usr/local/diamond/3.12/bin/lin64:$PATH >& /dev/null; 

# STEP: lattice_flow
cd /home/thechosenone98/fpga/PandA-bambu/documentation/bambu101/basic_usage
diamondc /home/thechosenone98/fpga/PandA-bambu/documentation/bambu101/basic_usage/HLS_output//Synthesis/lattice_flow_1/project.tcl

The board information you are looking for should be in the HLS_output/Synthesis/lattice_flow_1/project.tcl script which contains the information for the synthesis for Diamond.

Ok yes I did find it and changing it did work, now I need to understand how to implement a serial port along with my application. Would it be too much to ask to DM you? This is for a research project and I am in great need of help here 😅

About the serial device implementation, I think Diamond may provide some IP core which should take care of that and expose an interface that is then what you want to connect to the bambu-generated accelerator.
I suggest you have a look at the Diamond tool first so that you achieve a better understanding of the full system and where the generated kernel will be in there. Once you have defined the interface, which could be a UART serial interface controller IP, then you can define a signature for the top-level function which reflects the HDL signals of that interface.
As an example, let's say you have a UART IP which exposes the following:

  • Inputs:
    • tx_data
    • tx_valid
    • rx_rdy
  • Outputs:
    • tx_rdy
    • rx_data
    • rx_valid

At this point you may define a top-level function signature as follows:

void top_function(unsigned char tx_data, _Bool tx_valid, _Bool rx_rdy, _Bool tx_rdy, unsigned char rx_data, _Bool rx_ready);

At this point, you can implement the logic to interact with the interface signals as you like in the C implementation.

Ok I'm not too sure how to do this. I have a verilog file for a serial_rx and another one for the serial_tx part (serial_rx has a signal indicating a byte has arrived in and serial_tx has a signal to tell it it can start sending the byte present on it's input). How would I go about putting my C implementation in the middle of these too? Again sorry for the basic question probably but time is not something I have plenty of at the moment.

Ok so I managed to patch something together but now I'm getting a segfault at compilation...

thechosenone98@thechosenone98-GL753VE:~/fpga/HLS_Particle_Filter_ECP5/test$ bambu ../echo.c --top-fname=echo ../IPs.xml --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../constraints_STD.xml
 ==  Bambu executed with: bambu --top-fname=echo --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../echo.c ../IPs.xml ../constraints_STD.xml 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2022 Politecnico di Milano
  Version: PandA 0.9.8 - Revision 5dbf0aeef495c79ae7ae90f08883ff64a8c87d71-main

Target technology = FPGA

  Functions to be synthesized:
    echo


  Memory allocation information:
    BRAM bitsize: 8
    Spec may not exploit DATA bus width
    All the data have a known address
    Internal data is not externally accessible
    DATA bus bitsize: 8
    ADDRESS bus bitsize: 5
    SIZE bus bitsize: 4
    ALL pointers have been resolved
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Memory allocation information:
    BRAM bitsize: 8
    Spec may not exploit DATA bus width
    All the data have a known address
    Internal data is not externally accessible
    DATA bus bitsize: 8
    ADDRESS bus bitsize: 5
    SIZE bus bitsize: 4
    ALL pointers have been resolved
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Module allocation information for function echo:
Segmentation fault (core dumped)

Here is a link to the repo: https://github.com/thechosenone98/Particle-Filter-HLS-for-ULX3S
To get the behaviour mentionned above run the command

bambu ../echo.c --top-fname=echo ../IPs.xml --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../constraints_STD.xml

while being in a subfolder called whatever you like.

I get the same behavior on my end. Did you check the XML syntax against the one in https://github.com/ferrandi/PandA-bambu/tree/main/examples/pong for example?

Yes that is precisely what I used as a reference.

I made some tweaks, I had forgotten the verilog file reference (the NP_functionality line) but now I get this:

thechosenone98@thechosenone98-GL753VE:~/fpga/HLS_Particle_Filter_ECP5/test$ bambu ../echo.c --top-fname=echo ../IPs.xml --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../constraints_STD.xml
 ==  Bambu executed with: bambu --top-fname=echo --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../echo.c ../IPs.xml ../constraints_STD.xml 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2022 Politecnico di Milano
  Version: PandA 0.9.8 - Revision 5dbf0aeef495c79ae7ae90f08883ff64a8c87d71-main

Target technology = FPGA

  Functions to be synthesized:
    echo


  Memory allocation information:
    BRAM bitsize: 8
    Spec may not exploit DATA bus width
    All the data have a known address
    Internal data is not externally accessible
    DATA bus bitsize: 8
    ADDRESS bus bitsize: 5
    SIZE bus bitsize: 4
    ALL pointers have been resolved
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Memory allocation information:
    BRAM bitsize: 8
    Spec may not exploit DATA bus width
    All the data have a known address
    Internal data is not externally accessible
    DATA bus bitsize: 8
    ADDRESS bus bitsize: 5
    SIZE bus bitsize: 4
    ALL pointers have been resolved
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.01 seconds


  Module allocation information for function echo:
    Number of complex operations: 2
    Number of complex operations: 2
  Time to perform module allocation: 0.00 seconds


  Scheduling Information of function echo:
    Number of control steps: 7
    Minimum slack: 17.724499999000006
    Estimated max frequency (MHz): 322.00933816712063
  Time to perform scheduling: 0.01 seconds


  State Transition Graph Information of function echo:
    Number of states: 7
    Done port is registered
  Time to perform creation of STG: 0.00 seconds


  Easy binding information for function echo:
    Bound operations:19/19
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function echo:
    Number of storage values inserted: 2
  Time to compute storage value information: 0.00 seconds


  Module binding information for function echo:
    Number of modules instantiated: 19
    Number of performance conflicts: 0
    Estimated resources area (no Muxes and address logic): 394
    Estimated area of MUX21: 0
    Total estimated area: 394
    Estimated number of DSPs: 0
  Time to perform module binding: 0.00 seconds


  Register binding information for function echo:
    Register allocation algorithm obtains a sub-optimal result: 2 registers(LB:1)
  Time to perform register binding: 0.00 seconds

  Total number of flip-flops in function echo: 17
error -> BOOL only supports single bit values: 9 - Datapath_i/fu_echo_31015_31048/start_port (new_bit_size == 1)

Please report bugs to <panda-info@polimi.it>

even though all my BOOLs are of size 1.

There are some issues with the IPs.xml you provided. First, the clock signal should be named "clock", different naming are not supported. The same stands for the done port which is to be named "done_port" and does not need is_global and is_extern attributes. Furthermore, the first module is missing a start_port while all integrated IPs must provide one.
A valid IPs.xml should be as follows:

<?xml version="1.0"?>
<technology>
    <library>
        <name>STD_FU</name>
        <cell>
            <name>serial_rx</name>
            <operation operation_name="serial_rx" bounded="0" />
            <circuit>
                <component_o id="serial_rx">
                    <structural_type_descriptor id_type="serial_rx" />
                    <port_o id="clock" dir="IN" is_clock="1">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="reset" dir="IN">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="start_port" dir="IN">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="done_port" dir="OUT">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="i_Rx_Serial" dir="IN" is_global="1" is_extern="1">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="o_Rx_Byte" dir="OUT">
                        <structural_type_descriptor type="UINT" size="8" />
                    </port_o>
                    <NP_functionality LIBRARY="serial_rx" VERILOG_FILE_PROVIDED="serial_rx.v" />
                </component_o>
            </circuit>
        </cell>
        <cell>
            <name>serial_tx</name>
            <operation operation_name="serial_tx" bounded="0" />
            <circuit>
                <component_o id="serial_tx">
                    <structural_type_descriptor id_type="serial_tx" />
                    <port_o id="clock" dir="IN" is_clock="1">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="reset" dir="IN">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="start_port" dir="IN">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="done_port" dir="OUT">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="o_Tx_Serial" dir="OUT" is_global="1" is_extern="1">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <port_o id="i_Tx_Byte" dir="IN">
                        <structural_type_descriptor type="UINT" size="8" />
                    </port_o>
                    <port_o id="o_Tx_Active" dir="OUT" is_global="1" is_extern="1">
                        <structural_type_descriptor type="BOOL" size="1" />
                    </port_o>
                    <NP_functionality LIBRARY="serial_tx" VERILOG_FILE_PROVIDED="serial_tx.v" />
                </component_o>
            </circuit>
        </cell>
    </library>
</technology>

Finally, the interface of provided Verilog modules should be modified accordingly.

Ok great but my serial_rx doesn't require a start_port it always reads the next byte in and then signals to the next module that it has a valid byte with the done_port going high for 1 clock cycle. Also i put the done_port extern and global because I wanted to use it also on an LED output (signal that the byte has been sent)

The design you described in the echo.c is not going to work this way. The generated accelerator will have a start and done port, when the start port goes high then the accelerator will give the start signal to the serial_rx module, wait for the done to go high. At this point the if(byte) check is performed and if true the same will occur for the serial_tx module. After this the done signal of the accelerator will go high for one clock cycle.

If the accelerator has to be triggered by the rx ready signal then you have to describe this differently, that is way I was suggesting to expose serial IP signals on the top level interface of the kernel (at least serial_rx signals).

I'm not sure I'm following here... Also I added a placeholder start_port to the serial_rx but I still get the same error (you had done so in the pong module for plot.v)

I updated my code to reflect the changes you talked about (had forgotten some stuff). I want the accelerator to always run (while loop kinda thing) and monitor the serial_rx. I'll implement a state machine style code for what to do when a new byte comes in and hwo to treat the following bytes.

Please pull my updated code and run it. I now get this error which is very unhelpful:

thechosenone98@thechosenone98-GL753VE:~/fpga/HLS_Particle_Filter_ECP5/test$ bambu ../echo.c --top-fname=echo ../IPs.xml --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../constraints_STD.xml
 ==  Bambu executed with: bambu --top-fname=echo --file-input-data=../serial_tx.v,../serial_rx.v --clock-period=20.83 --device-name=LFE5U85F8BG756C ../echo.c ../IPs.xml ../constraints_STD.xml 


********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2022 Politecnico di Milano
  Version: PandA 0.9.8 - Revision 5dbf0aeef495c79ae7ae90f08883ff64a8c87d71-main

Target technology = FPGA

  Functions to be synthesized:
    echo


  Memory allocation information:
    BRAM bitsize: 8
    Spec may not exploit DATA bus width
    All the data have a known address
    Internal data is not externally accessible
    DATA bus bitsize: 8
    ADDRESS bus bitsize: 5
    SIZE bus bitsize: 4
    ALL pointers have been resolved
    Internally allocated memory (no private memories): 0
    Internally allocated memory: 0
  Time to perform memory allocation: 0.00 seconds


  Module allocation information for function echo:
    Number of complex operations: 2
    Number of complex operations: 2
  Time to perform module allocation: 0.01 seconds


  Scheduling Information of function echo:
    Number of control steps: 6
    Minimum slack: 17.724499998999992
    Estimated max frequency (MHz): 322.00933816711915
  Time to perform scheduling: 0.00 seconds


  State Transition Graph Information of function echo:
    Number of states: 6
    Done port is registered
  Time to perform creation of STG: 0.00 seconds


  Easy binding information for function echo:
    Bound operations:20/20
  Time to perform easy binding: 0.00 seconds


  Storage Value Information of function echo:
    Number of storage values inserted: 2
  Time to compute storage value information: 0.00 seconds


  Module binding information for function echo:
    Number of modules instantiated: 20
    Number of performance conflicts: 0
    Estimated resources area (no Muxes and address logic): 394
    Estimated area of MUX21: 0
    Total estimated area: 394
    Estimated number of DSPs: 0
  Time to perform module binding: 0.00 seconds


  Register binding information for function echo:
    Register allocation algorithm obtains a sub-optimal result: 2 registers(LB:1)
  Time to perform register binding: 0.00 seconds


  Connection Binding Information for function echo:
    Number of allocated multiplexers (2-to-1 equivalent): 1
  Time to perform interconnection binding: 0.00 seconds

  Total number of flip-flops in function echo: 17
error -> This point should never be reached - 

Please report bugs to <panda-info@polimi.it>

done_port of serial_tx module is still global and extern, this is not allowed. If you want this port to be routed separately so that you can connect it with other logic outside the accelerator you need to add a duplicate port for this purpose only.

Cool no errors now. One other question. How do I map the ports to pins now? I'm really sorry for all the nit picky questions but I find the documentation quite lacking...

Indeed the documentation is not the best, but time is an issue on our side too. About the ports to pins mapping, this has to be specified within the Diamond tool, Bambu just generates the HDL description for the accelerator.

Ok how should I open the generated project.tcl (if that's even what I need to open) inside Diamond because starting diamond with -t project.tcl simply runs it and then exits but I need to be able to specify the constraints file for the pins before it runs the whole synthesis and place and route steps....

Basically how do I get to make my own pcf file for the board and then link it to your tcl script?

EDIT: Apparently what I need is the LPF file which I have found for my board and I have edited the name of the pins I need (I have done this before for other board so here I know what I am doing). That said, I still don't know how to tell your script or bambu to use that file for pin configuration.

The design you described in the echo.c is not going to work this way. The generated accelerator will have a start and done port, when the start port goes high then the accelerator will give the start signal to the serial_rx module, wait for the done to go high. At this point the if(byte) check is performed and if true the same will occur for the serial_tx module. After this the done signal of the accelerator will go high for one clock cycle.

If the accelerator has to be triggered by the rx ready signal then you have to describe this differently, that is way I was suggesting to expose serial IP signals on the top level interface of the kernel (at least serial_rx signals).

I just understood what you meant but then how am I suppose to go about this, I want the accelerator to always be running and treat the serial input as they com in.