rtspk / SystolicArray

A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Libano's Systolic Array Generator

A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs.


Overview

In a systolic array, there is a rythmic style of computation, in which, at every clock cycle, input data is pumped in, and output data is pumped out. The term systolic is therefore a reference to the functioning of a biological heart[1].

There are a number of mathematical operations that can be implemented using systolic arrays, but the one in this project is a weight stationary matrix multiplier. Nowadays, systolic arrays are the architectural core of state-of-the-art neural network accelerators, such as Google's DPU[2] and Xilinx's TPU[3].

This implementation uses 8-bit integer representation for the inputs, which allows for simultaneosly executing two multiplications in a single DSP[4]. Furthermore, a time-multiplexing scheme is employed on the DSPs[5][6], allowing them to run twice as fast as the rest of the logic. Thus, overall, each DSP is able to execute four 8-bit integer multiplications per clock cycle. The adders responsible for accumulation are implemented with CLB[7][8] elements, such as LUTs and CARRYs.

Hence, the Processing Elements (PEs) that constitute the array are multiply-accumulate (MAC) units.

array-mem-pe


Resource Utilization & Performance

Given a systolic array of size NxN:

  • DSPs: N2 DSP48E[1[5]|2[6]] (1 for each PE)
  • BRAMs: 6N RAMB18E[1[9]|2[10]] (N for each input/output matrix: ABCD,E,W,X,Y,Z)
  • Operations/Cycle: 8N2 (N2 PEs, 2x2xMUL + 4xADD per PE)
  • Frequency: Will mostly depend target device, but can also depend on N (/validation/)
    • 8x8/14x14 @ XC7Z020 @ 200MHz
    • 8x8/14x14/32x32 @ XCZU9 @ 300MHz

testbench


Repository Organization

  • /docs/: Relevant repository documentation.
  • /example/: Vivado project for a 2x2 array, including testbenches, and an use-case scenario with AXI DMA.
  • /generator/: Python script for generating RTL (edit 'settings.py', run 'main.py', import '/RTL/import_me/*').
  • /validation/: OOC Vivado projects, scripts, and reports for synth/place/route of 8x8/14x14/32x32 arrays on 7000/UltraScale.

References


Extras

systolic-demo

About

A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs.


Languages

Language:SystemVerilog 77.8%Language:Python 22.2%