pigirons / cpufp

A CPU tool for benchmarking the peak of floating points

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

asm/cpufp_kernel_x86_avx_vnni.S:16: Error: unsupported instruction `vpdpbusd'

clingfei opened this issue · comments

commented

When I execute sh build.sh,an error occurred:

asm/cpufp_kernel_x86_avx_vnni.S: Assembler messages:
asm/cpufp_kernel_x86_avx_vnni.S:16: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:17: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:18: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:19: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:20: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:21: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:22: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:23: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:24: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:25: Error: unsupported instruction `vpdpbusd'
asm/cpufp_kernel_x86_avx_vnni.S:42: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:43: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:44: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:45: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:46: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:47: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:48: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:49: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:50: Error: unsupported instruction `vpdpwssd'
asm/cpufp_kernel_x86_avx_vnni.S:51: Error: unsupported instruction `vpdpwssd'
gcc: error: cpufp_kernel_x86_avx_vnni.o: No such file or directory

Should I use a specified version of gcc? or add some options to build.sh?

Current version of my gcc is like below:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.4.0-1ubuntu1~20.04.1' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-Av3uEd/gcc-9-9.4.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

ubuntu 20.04 may not support avx vnni instructions.
update to 22.04.

a newer clang-13 compiler fix this issue too.

when using gcc-9:

gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0

on a cpu of

Architecture:                       x86_64
On-line CPU(s) list:                0-31
Model name:                         13th Gen Intel(R) Core(TM) i9-13900K
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fx
                                    sr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts re
                                    p_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor 
                                    ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
                                    line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb s
                                    tibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 s
                                    mep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 x
                                    saves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp 
                                    hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_cle
                                    ar serialize pconfig arch_lbr flush_l1d arch_capabilities

error would be:

./build_x64.sh 
x64/asm/_AVX_VNNI_.S: Assembler messages:
x64/asm/_AVX_VNNI_.S:22: Error: unsupported instruction `vpdpbusd'
x64/asm/_AVX_VNNI_.S:23: Error: unsupported instruction `vpdpbusd'
x64/asm/_AVX_VNNI_.S:24: Error: unsupported instruction `vpdpbusd'

after sudo apt-get install clang-13
and use

gcc=clang-13
g++=clang++-13

compile success, result is right, avx_vnni dp4 works about 2x fast as FMA_f32, not 4x :

./cpufp --thread_pool=[0-31]
Number Threads: 32
Thread Pool Binding: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
--------------------------------------------------------------
| Instruction Set | Core Computation      | Peak Performance |
| AVX_VNNI        | DP4A(s32,u8,s8)       | 4212.4 GOPS      |
| AVX_VNNI        | DP2A(s32,s16,s16)     | 2106.6 GOPS      |
| FMA             | FMA(f32,f32,f32)      | 2023.5 GFLOPS    |
| FMA             | FMA(f64,f64,f64)      | 1011.9 GFLOPS    |
| AVX             | ADD(MUL(f32,f32),f32) | 1010.6 GFLOPS    |
| AVX             | ADD(MUL(f64,f64),f64) | 515.27 GFLOPS    |
| SSE             | ADD(MUL(f32,f32),f32) | 954.07 GFLOPS    |
| SSE2            | ADD(MUL(f64,f64),f64) | 473.77 GFLOPS    |
--------------------------------------------------------------