Fused Arithmetic Analysis for Efficient Hardware Datapath [MEng Final-Year Project]

Eusebius M. Ngemera, Department of Electrical & Electronic Engineering, Imperial College London.

Introduction

This is a fork of SOAP version 1 that considers the floating-point fused arithmetic units: 3-input adder, constant multiplier and fused multiply-add.

Version 1 of SOAP takes in a numerical expression (additions and multiplications of variables and constants), value ranges of the input variables and returns a set of rewritten equivalent expressions such that when synthesised onto an FPGA, the area in number of LUTs (look-up tables) and numerical accuracy in the form maximum absolute error are both minimised.

Forked version 1 from https://github.com/admk/soap at commit b1bd173bb47f3ca8afbb0e0bb0b440f88bcf69a5.

More details including my report are here.

Install

Instructions are given for Ubuntu and are expected to work for other major Linux OS's.

Install matplotlib.
Install Python3, but this pretty much always already installed.
Install dependencies:

pip3 install -r requirements.txt

Install gmpy2 outside of pip:

sudo apt-get install python3-gmpy2

Optional

Optionally, in order to allow fetching of area information beyond that already stored (the included cache is sufficient for the default benchmark parameters):

Install FloPoCo 2.5.0
Install ISE Design Suite (version 14.7; needs a license file)

Either add the locations of these binaries to your $PATH, or make a symbolic link and add it to an existing $PATH location like /usr/bin/:

/path/to/your/flopoco-2.5.0/flopoco 
/path/to/your/Xilinx/14.7/ISE_DS/ISE/bin/lin64/xst

Usage

While in the project directory, run the following command to run the default parameters and see graphs.

PYTHONPATH=. python3 tests/fused/analysis.py

Parameters

The function call run() at the end of tests/fused/analysis.py can take keyword arguments, as shown below.

Defaults are to run the benchmark expressions available (a subset from PolyBench and Livermore Loops) at single-precision. Area dynamic cache is used by default and full closure is performed with a maximum transformation depth of 100. The multiple-use FMA type is default and singular frontiers are expanded when plotted.

logging='warning',
# ('o'|'off') | ('e'|'error') | ('w'|'warning') | ('i'|'info') | ('v'|'d'|'debug')
benchmarks='suites',
# comma-separated list of the names or ('a'|'all') | ('s'|'suite'|'suites')
precision='single',
# wF integer or ('h'|'half') | ('s'|'single') | ('d'|'double') | ('q'|'quad'|'quadruple)
algorithm='closure',
# ('f'|'frontier') | ('gf'|'fg'|'greedy_frontier') | ('g'|'greedy') | ('c'|'closure')

use_area_cache=True, # area_dynamic.pkl
timing=True, # invalidate internal cache or not (not including the area caches)
alert_finish=False, # Ubuntu only

# Multiple precisions
vary_precision=False,
vary_precision_one_frontier=True, # show one frontier
precision_step=1, precision_start=22, precision_end=53,
# range(precision_start, precision_end + 1, precision_step)

# Fused Multiply-Add (FMA)
# fma_wf_factor overrides LSB_acc (LSBA) set by single_use_fma
fma_wf_factor=None,
# LSB_acc = MSB_acc - int(fma_wf_factor * wf) - 1
single_use_fma=False, # 
# True: LSB_acc = max(a_mul_b_exp_bounds.min, c_exp_bounds.min) - wF -1
# False: LSB_acc = min(a_mul_b_exp_bounds.min, c_exp_bounds.min) - wF

# Transformation depth
vary_transformation_depth=False, # 1 to 6
transformation_depth=100,

# Plotting
annotate=False,
annotate_size=14,
expand_singular_frontiers=True,
expand_all_frontiers=False,

compare_with_soap3=False, # only for `seidel` at single precision

eugenius1 / soap