PULP-NN Mixed

PULP-NN Mixed is an optimized library which works with sub-byte operands, typically in a scenario in which native operands reach INT8 at least. It is explained in detail in Bruschi et al. [arXiv:2007.07759]. If you intend to use or reference PULP-NN Mixed for an academic publication, please consider citing it:

@inproceedings{10.1145/3387902.3394038,
author = {Bruschi, Nazareno and Garofalo, Angelo and Conti, Francesco and Tagliavini, Giuseppe and Rossi, Davide},
title = {Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices},
year = {2020},
isbn = {9781450379564},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3387902.3394038},
doi = {10.1145/3387902.3394038},
booktitle = {Proceedings of the 17th ACM International Conference on Computing Frontiers},
pages = {217–220},
numpages = {4},
keywords = {embedded systems, quantized neural network, low power architectures},
location = {Catania, Sicily, Italy},
series = {CF ’20}
}

Structure of the library

The library is organized as follow:

The 32bit and 64bit directories refer to the precision of the batch normalization parameters;
To use the library the header file under the include directory should be inserted in your QNN inference code. They are pulp_nn_kernels.h and pulp_nn_utils.h, which contains every kernel and useful function of PULP-NN Mixed library. This directory and the contained files are generated by pulp_nn_kernels_generator.py;
The directory src contains every computational kernel and is generated by pulp_nn_kernels_generator.py;
The directory scripts contains the templates and the useful files to generate the code of every kernel, header and example.
The test directory which is generated by pulp_nn_examples_generator.py and contains a completed setup to run a test with some kernels of the library.

Requirements

If you want to use the pre-generated src and include files you do not need any other installation. If you want to use the features described above and contained inscripts directory, are strictly required:

python3
torch
numpy
Mako

If you have not done yet, please install them in order to obtain more from PULP-NN such as generate tests for every kernel and modify the templates generating your custom kernels.

Users Mode

In order to use the library in an existing project, you can copy the sources and the headers that are already generated in src and include directories.

If you want to test the library sources, you can generate the whole setup (pulp-sdk based) and golden models using, from directory radix:

> cd scripts
> python3 pulp_nn_examples_generator.py

In order to select the kernels to test, open scripts/setup.py and follow the instructions. You can test either a single kernel per type or all set of kernels per type (pointwise convolution, depthwise convolution, linear with 32-bit of outputs precision and linear with sub-byte of outputs precision)

Then, you can run the simulation on your favorite target architecture using, from directory radix:

> cd test
> make clean all run cores=NUM_CORES kernel=KERNEL platform=PLATFORM

Where, NUM_CORES is the number of cores (by default is set to 1) that you want to use and KERNEL is the precision configuration of the kernel (by default is set to 888 or 88) that you want to test (every permutation is already included).

example: make clean all run cores=8 kernel=888 (and you have selected pointwise in scripts/pulp_nn_examples_generator.py) you will see the results of the 8-bit of inputs, 8-bit of output and 8-bit of weights (in this order) pointwise kernel results, computed in a cluster execution with 8 cores on. Note that, for linear kernels with 32-bit of outputs precision KERNEL can be 88, 84, 82 and so on, for the inputs and weights precision.

Developers Mode

You could modify the kernel sources which are been generated or on the templates used for that, which are in scripts/templates. Then, you can regenerate them using, from directory radix:

> cd scripts
> python3 pulp_nn_kernels_generator.py

Getting Started with PULP-NN

Firstly, you should clone the repository on your workstation, using:

> git clone https://github.com/pulp-platform/pulp-nn.git

now you have your local copy of the repository. Then, you should build the sdk (as done in its README), targeting an architecture and a platform. For example, if you want to try pulp-open architecture on virtual platform (gvsoc) you should type, from pulp-sdk radix:

> export PULP_RISCV_GCC_TOOLCHAIN = <toolchain_path>
> source configs/pulp.sh
> source configs/platform-gvsoc.sh
> make all
> source pkg/sdk/dev/sourceme.sh

now you can compile and run applications on your favorite platform. For example, if you want to try convolutional kernels you should modify from scripts/setup.py the layer parameters such as H, W and channels, the type of kernel, introducing 'pointwise' and then generate test folder, using, from scripts folder:

> python3 pulp_nn_examples_generator.py
> cd ../32bit/test
> make clean all run cores=NUM_CORES kernel=KERNEL platform=PLATFORM

as seen above. In test folder there are everything as you will need to run the example, headers and sources will be copied and Makefile and main will be generated.

Support and Contribution

Nazareno Bruschi, University of Bologna, email
Angelo Garofalo, University of Bologna, email
Alessio Burrello, University of Bologna, email
Francesco Conti, University of Bologna, email
Giuseppe Tagliavini, University of Bologna, email
Manuele Rusci, University of Bologna, email
Davide Rossi, University of Bologna, email

Current limitations

Some kernels lack in this version compared to 8bit directory (add, maxpool and avgpool, pointwise);
Tests for 64bit batch normalization parameters are not supported yet;
Golden models generator is a first version and it could print strange output results (but not wrong). If they do not satisfy your purpose you can tune seed parameter in golden function, which its definition is in scripts/test_gen.py and do again the example generation step;
Channels of input/output feature maps must be multiple of 2 for INT4 precision and of 4 for INT2 one;

pulp-platform / pulp-nn-mixed