docularxu / dtrmm-sve

my experimental implementation of DTRMM with Arm SVE intrinsics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is an implementation of DTRMM using Arm SVE 256 bits assembly.

This is a project based on BLISLab's GEMM reference code. Intention of this fork is
to implement DTRMM using Arm's SVE intrinsics.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                ____  _     ___ ____  _       _     
               | __ )| |   |_ _/ ___|| | __ _| |__  
               |  _ \| |    | |\___ \| |/ _` | '_ \ 
               | |_) | |___ | | ___) | | (_| | |_) |
               |____/|_____|___|____/|_|\__,_|_.__/ 

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

BLISlab: A Sandbox for Optimizing GEMM

How to compile and execute the code:
1. Change the options in sourceme.sh and set the environment variables.
$ source sourceme.sh
2. Compile the code, generate the library and test executables.
$ make clean
$ make
3. Execute the test driver.
$ cd test
$ ./run_bl_dtrmm_1_10000.sh

On SVE not-enabled machines, please use with emulator, such as armie:
$ armie -msve-vector-bits=256 -i libinscount_emulated.so -- ./test_bl_dtrmm.x 1000 1000

Another way is to use SVE enabled QEMU user application, such as:
$ [qemu.git]/aarch64-linux-user/qemu-aarch64 -cpu max,sve256=on test/test_bl_dtrmm.x 3 3

To build a QEMU-user locally, please follow these steps:
$ git clone https://git.qemu.org/git/qemu.git qemu.git
$ cd qemu.git
$ ./configure --target-list=aarch64-linux-user --static
$ make -j8
$ ./aarch64-linux-user/qemu-aarch64 --version
qemu-aarch64 version 4.2.92 (v5.0.0-rc2-9-g17e1e49814-dirty)

About

my experimental implementation of DTRMM with Arm SVE intrinsics


Languages

Language:C 91.8%Language:Assembly 3.8%Language:MATLAB 1.3%Language:Shell 1.0%Language:Makefile 1.0%Language:PHP 1.0%