marian-nmt / marian-dev

Fast Neural Machine Translation in C++ - development repository

Home Page:https://marian-nmt.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cmake does not detect AVX2 support on macOS

jelmervdl opened this issue · comments

Bug description

Marian is build without AVX2 support on macOS even though the cpu (in this case Intel(R) Core(TM) i9-9880H) does support the instructions.

It seems to be due to a "faulty" check in cmake/FindSSE.cmake. It uses sysctl to look at machdep.cpu.features which does not mention AVX2. Apparently, AVX2 is reported in machdep.cpu.leaf7_features.

Replacing

EXEC_PROGRAM("/usr/sbin/sysctl -n machdep.cpu.features" OUTPUT_VARIABLE CPUINFO)

with

EXEC_PROGRAM("/usr/sbin/sysctl -n machdep.cpu.features machdep.cpu.leaf7_features" OUTPUT_VARIABLE CPUINFO)

does make it detect AVX2 properly. After building marian-decoder with it, it runs without "illegal instruction" errors (and is indeed slightly faster).

How to reproduce

Build marian-dev on macOS 11.3, e.g.:

$ cmake .. -DICU_ROOT=(brew --prefix icu4c) -DCOMPILE_CUDA=off -DUSE_FBGEMM=on
-- Project name: marian
-- Project version: v1.10.0+6f6d4846
CMake Warning at CMakeLists.txt:74 (message):
  CMAKE_BUILD_TYPE not set; setting to Release


-- Checking support for CPU intrinsics
-- Could not find hardware support for AVX2 on this machine.
-- Could not find hardware support for AVX512 on this machine.
-- SSE2 support found
-- SSE3 support found
-- SSE4.1 support found
-- AVX support found

Notice Could not find hardware support for AVX2 on this machine.

Context

Output of sysctl on my MacBook, filtered to machdep.cpu.*:

> /usr/sbin/sysctl -a | fgrep machdep.cpu
machdep.cpu.xsave.extended_state: 31 832 1088 0
machdep.cpu.xsave.extended_state1: 15 832 256 0
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.data.small_level1: 64
machdep.cpu.tlb.inst.large: 8
machdep.cpu.thermal.ACNT_MCNT: 1
machdep.cpu.thermal.core_power_limits: 1
machdep.cpu.thermal.dynamic_acceleration: 1
machdep.cpu.thermal.energy_policy: 1
machdep.cpu.thermal.fine_grain_clock_mod: 1
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.package_thermal_intr: 1
machdep.cpu.thermal.sensor: 1
machdep.cpu.thermal.thresholds: 2
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.linesize_max: 64
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.sub_Cstates: 286531872
machdep.cpu.cache.L2_associativity: 4
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.size: 256
machdep.cpu.arch_perf.events: 0
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.fixed_number: 3
machdep.cpu.arch_perf.fixed_width: 48
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.version: 4
machdep.cpu.arch_perf.width: 48
machdep.cpu.address_bits.physical: 39
machdep.cpu.address_bits.virtual: 48
machdep.cpu.tsc_ccc.denominator: 2
machdep.cpu.tsc_ccc.numerator: 192
machdep.cpu.brand: 0
machdep.cpu.brand_string: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
machdep.cpu.core_count: 8
machdep.cpu.cores_per_package: 8
machdep.cpu.extfamily: 0
machdep.cpu.extfeature_bits: 1241984796928
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI
machdep.cpu.extmodel: 9
machdep.cpu.family: 6
machdep.cpu.feature_bits: 9221960262849657855
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_feature_bits: 43804591 1073741824
machdep.cpu.leaf7_feature_bits_edx: 3154118144
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT SGXLC MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
machdep.cpu.logical_per_package: 16
machdep.cpu.max_basic: 22
machdep.cpu.max_ext: 2147483656
machdep.cpu.microcode_version: 234
machdep.cpu.model: 158
machdep.cpu.processor_flag: 5
machdep.cpu.signature: 591597
machdep.cpu.stepping: 13
machdep.cpu.thread_count: 16
machdep.cpu.vendor: GenuineIntel