make - generate a test case using test.c make ar - generate swblas.a
mkdir build cd build && cmake ..
link ./build/libswBLASlib.a and include cblas.h swblas.h
A test case in ./test/ sh run.sh $M $K $N
-DUSE_RTC count time inside CPE -DUSE_COMP without it, you will get DMA time -DCHECK_RES check answer with xMath
void sw_sgemm_trans(float* input, float* weight, float* output, int M, int N, int K, int blkM, int blkN, int blkK); input(K, M) * weight(K, N) -> output (N, K) input, weight , output are in 2D matrix (high dim, low dim) blkM/N/K are block size on the corresponding dimension. Requirments : M and blkM should be 128x, K and blkK should be 8x, N and blkN should be 32x;
sh ./auto_test.sh python ./show_raw_data.py
- use -O1 rather than -O2 for sw_slave_XXX files, otherwise you will get stuck
- function name in ./asm should not be too long. For example, dgemmasmnoinit will not pass compilation
- If you need to use SIMD inside CPE, you should allocate LDM space with points in type of floatv4*/doublev4
- When we use ./build/libswBLASlib.a in other code, accessing MBW map will cause unpredicatable bug! Maybe allocate a large array in stack space is not supported very well.
rpcc time is different with timer for eslapse bwteen athread spawn and join. if you use rpcc to get time, you will get wrong time in MPE. Maybe athread time is large in small case.
Jiarui Fang [THU and NSCCWX] <\br> fang_jiarui@163.com