flame / blis

BLAS-like Library Instantiation Software Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

armsve : non-multiple of NR for one or more datatypes

egaudry opened this issue · comments

Running c9700f3 on an ARM sve-enabled processor, fails with

libblis: frame/base/bli_gks.c (line 451):
libblis: Default NC is non-multiple of NR for one or more datatypes.
libblis: Aborting.

Can you confirm that there is not a failure using the commit's parent ee9ff98?

I used
35195bb
9be97c1
and master (c9700f3) and got the same error with the 3 of them.

@xrq-phys any idea about this?

Weird... This line should guarantee things' working.

I wonder whether you could try this artifact (the aarch64-linux one).

Perhaps it'd help eliminating the compiler's influence.

@xrq-phys I tried this however it seems some symbols are missing:
dsyr2k_ (https://github.com/JuliaBinaryWrappers/blis_jll.jl/releases/download/blis-v0.8.1%2B2/blis.v0.8.1.aarch64-linux-gnu.tar.gz).

If you let know how you build it, I can try on my own and activate what I need.

@fgvanzee is there a "verbose" mode that can be enabled at runtime to let us see what the actual values are? Failing that, @egaudry you could always add print statements to the file indicated by @xrq-phys (bli_armsve_utils.c).

Oops. I forgot there's a symbol mangling there.

On current master, in fact it should be OK to built via:
configure armsve; make -j$NPROC.

@xrq-phys I think @egaudry was referring to the compiler version etc. (maybe)? @egaudry while we're at it, what compiler version are you using?

My apologies.

GCC 10.3.0 & GCC 11.

@fgvanzee is there a "verbose" mode that can be enabled at runtime to let us see what the actual values are?

Normally, when I want to confirm the active cache and register blocksizes at runtime, I look at the informational header output of the testsuite. However, that might not have helped in this instance because the runtime check triggering the abort() happens during library initialization, so running the testsuite would have given you the same error.

But it looks this this is moot now, yeah?

Unsure if #615 might serve as a workaround.

I built blis-master (and other older versions) using GCC-11.2 on CentOS-7.9, using the following

BLIS_TARGET=armsve
CC=gcc
FC=gfortran
CFLAGS="-fPIC -pthread -D_GNU_SOURCE -O2"
FFLAGS="-pthread -fPIC -D_GNU_SOURCE -fallow-argument-mismatch -O2"
LDFLAGS="-fPIC -pthread -Wl,-rpath='\$$\$\ORIGIN/../lib' -Wl,-rpath='\$$\$\ORIGIN/../../lib' -Wl,-rpath='\$$\$\ORIGIN/../../../lib'"
./configure --prefix=./JUST_THERE --disable-static --enable-shared --enable-threading=openmp --enable-cblas --enable-blas $BLIS_TARGET && make -j4 && make install

@fgvanzee is right that the issue happens right from the start, upon initialization.
@xrq-phys if #615 is about enabling SVE support on non A64FX processor, I will need to try for sure.

@egaudry #615 is a potential fix for this issue. Please try it and if it works I'll merge. Thanks @xrq-phys for the PR.

In the future, if you ever need to see what the blocksizes are, you could try to comment out the runtime check causing the abort(), recompile, and then run the testsuite. It might bomb when running level-3 operations, but at least you'll get the info you want.

@devinamatthews I haven't checked the actual returned values, but #615 fixes the initialization issue indeed. However, the performance largely decreases (4x slower) compared to that of the thunderx2 build BLIS version (I built 4d83523+#615).

@fgvanzee thanks for the clarification, I'll try to remember this to provide more useful inputs when encountering this kind of issues in the future.

Thanks.

Unsure about the chip you're using, but if you could set the following env variables to your chip's actual value (approximated values are enough), it might be helpful to your perf. result:

BLIS_SVE_W_L1 # L1 number of sets
BLIS_SVE_N_L1 # L1 associativity degree
BLIS_SVE_C_L1 # L1 cache line size in bytes
BLIS_SVE_W_L2 # L2 number of sets
BLIS_SVE_N_L2 # L2 associativity degree
BLIS_SVE_C_L2 # L2 cache line size in bytes
BLIS_SVE_W_L3 # any big value
BLIS_SVE_N_L3 # 4 is OK
BLIS_SVE_C_L3 # any big value

@xrq-phys can you please set defaults appropriate for whatever the most common SVE chip is? (Maybe this is just A64fx for now?) It would be nice to be able to grab these from the OS or hwloc but that is more work.

@xrq-phys NVM I saw the comment in the PR that A64fx is the default.