f16falcona46 / pi-gemm

A Raspberry Pi GPU-accelerated implementation of the GEMM matrix-multiply function

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pi-GEMM

This is a GPU-accelerated implementation of the GEMM matrix multiply function for the Raspberry Pi.

The core is an assembler loop for Broadcoms QPU processor, and is run as a custom program on their GPU. It produces a substantial speedup compared to an optimized CPU version, with the included test running in 500ms on my overclocked Pi, rather than 8,000 ms using the official Atlas library on Raspbian on the same device.

Getting Started

Download the repo, sudo apt-get install libatlas-dev m4, run make, and then run sudo ./gemm.

Notes

It always overwrites the output 'C' matrix, rather than incrementing it by 'beta'.

You have to run the program as 'su', so that the library can get direct access to the GPU.

License

All code is under the BSD three-clause license, included in this folder as LICENSE.

Credits

Written by Pete Warden at Jetpac Inc.

Thanks to eman on the Pi forums for the SHA-256 examples, Andrew Holme for creating the Fourier library, Herman Hermitage for his QPU documentation work, and Broadcom for releasing the hardware specifications of their GPU!

About

A Raspberry Pi GPU-accelerated implementation of the GEMM matrix-multiply function

License:Other


Languages

Language:C++ 57.3%Language:Assembly 36.4%Language:C 5.1%Language:Makefile 1.1%