Usage of sgemm.cucl in full connected networks

Question

Usage of sgemm.cucl in full connected networks

dinhv opened this issue 7 years ago · comments

Does the sgemm.cucl file have any usage for running a neural network in boda? When I run the run_cnet mode I don't see that the sgemm.cucl (or any related sgemm template) being instantiated and filled. So are sgemm cucl files only for benchmarking?

moskewcz · Answer 1 · Fri Apr 21 2017 00:15:06 GMT+0800 (China Standard Time)

yes. at least currently, they are not used for any neural net examples.

to elaborate, the sgemm code was actually added much later than the initial convolution variants, and i mainly used it to do initial experiments and learn about qualcomm GPUs. i only used it for large-ish square sizes that are 'nice' multiples of various things.

in the future, it might make sense to do any of the following:

benchmark/tune sgemm for various input sizes (e.g. those needed for NNs or other apps)
actually add code to be able to use sgemm for convolutions as a baseline (i.e. using im2col). note that since we can already run other sgemm implementations inside boda (e.g. cublas), this is perhaps more useful for comparison than you might think: i.e. we can get hard numbers for using the platform BLAS lib as opposed to our BLAS and/or direct convolutions.
use sgemm results as a way to profile/analyses new platforms: which variants/tuning params work well? what is the achievable BW/FLOPS?
make the sgemm implementation more complete: handle non-square and/or arbitrary sizes