auto not detecting correct architecture for ryzen 5
zerothi opened this issue · comments
I have this:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 80
Model name: AMD Ryzen 5 PRO 5650U with Radeon Graphics
Stepping: 0
Frequency boost: enabled
CPU MHz: 1330.581
CPU max MHz: 4806.6401
CPU min MHz: 1600.0000
BogoMIPS: 4591.45
Virtualization: AMD-V
L1d cache: 192 KiB
L1i cache: 192 KiB
L2 cache: 3 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-11
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
I got this:
./configure -p ... -t no --enable-blas --enable-cblas auto
...
*** Unable to automatically detect hardware type! ***
if you need anything, please let me know!
Can you please share output of one of the following command on this system?
dmidecode -t processor
cpuid --one-cpu
Output of cpuid --one-cpu
:
CPU:
vendor_id = "AuthenticAMD"
version information (1/eax):
processor type = primary processor (0)
family = 0xf (15)
model = 0x0 (0)
stepping id = 0x0 (0)
extended family = 0xa (10)
extended model = 0x5 (5)
(family synth) = 0x19 (25)
(model synth) = 0x50 (80)
(simple synth) = AMD (unknown model) [Zen 3], 7nm
miscellaneous (1/ebx):
process local APIC physical ID = 0x8 (8)
maximum IDs for CPUs in pkg = 0xc (12)
CLFLUSH line size = 0x8 (8)
brand index = 0x0 (0)
brand id = 0x00 (0): unknown
feature information (1/edx):
x87 FPU on chip = true
VME: virtual-8086 mode enhancement = true
DE: debugging extensions = true
PSE: page size extensions = true
TSC: time stamp counter = true
RDMSR and WRMSR support = true
PAE: physical address extensions = true
MCE: machine check exception = true
CMPXCHG8B inst. = true
APIC on chip = true
SYSENTER and SYSEXIT = true
MTRR: memory type range registers = true
PTE global bit = true
MCA: machine check architecture = true
CMOV: conditional move/compare instr = true
PAT: page attribute table = true
PSE-36: page size extension = true
PSN: processor serial number = false
CLFLUSH instruction = true
DS: debug store = false
ACPI: thermal monitor and clock ctrl = false
MMX Technology = true
FXSAVE/FXRSTOR = true
SSE extensions = true
SSE2 extensions = true
SS: self snoop = false
hyper-threading / multi-core supported = true
TM: therm. monitor = false
IA64 = false
PBE: pending break event = false
feature information (1/ecx):
PNI/SSE3: Prescott New Instructions = true
PCLMULDQ instruction = true
DTES64: 64-bit debug store = false
MONITOR/MWAIT = true
CPL-qualified debug store = false
VMX: virtual machine extensions = false
SMX: safer mode extensions = false
Enhanced Intel SpeedStep Technology = false
TM2: thermal monitor 2 = false
SSSE3 extensions = true
context ID: adaptive or shared L1 data = false
SDBG: IA32_DEBUG_INTERFACE = false
FMA instruction = true
CMPXCHG16B instruction = true
xTPR disable = false
PDCM: perfmon and debug = false
PCID: process context identifiers = false
DCA: direct cache access = false
SSE4.1 extensions = true
SSE4.2 extensions = true
x2APIC: extended xAPIC support = false
MOVBE instruction = true
POPCNT instruction = true
time stamp counter deadline = false
AES instruction = true
XSAVE/XSTOR states = true
OS-enabled XSAVE/XSTOR = true
AVX: advanced vector extensions = true
F16C half-precision convert instruction = true
RDRAND instruction = true
hypervisor guest status = false
cache and TLB information (2):
processor serial number = 00A5-0F00-0000-0000-0000-0000
MONITOR/MWAIT (5):
smallest monitor-line size (bytes) = 0x40 (64)
largest monitor-line size (bytes) = 0x40 (64)
enum of Monitor-MWAIT exts supported = true
supports intrs as break-event for MWAIT = true
number of C0 sub C-states using MWAIT = 0x1 (1)
number of C1 sub C-states using MWAIT = 0x1 (1)
number of C2 sub C-states using MWAIT = 0x0 (0)
number of C3 sub C-states using MWAIT = 0x0 (0)
number of C4 sub C-states using MWAIT = 0x0 (0)
number of C5 sub C-states using MWAIT = 0x0 (0)
number of C6 sub C-states using MWAIT = 0x0 (0)
number of C7 sub C-states using MWAIT = 0x0 (0)
Thermal and Power Management Features (6):
digital thermometer = false
Intel Turbo Boost Technology = false
ARAT always running APIC timer = true
PLN power limit notification = false
ECMD extended clock modulation duty = false
PTM package thermal management = false
HWP base registers = false
HWP notification = false
HWP activity window = false
HWP energy performance preference = false
HWP package level request = false
HDC base registers = false
Intel Turbo Boost Max Technology 3.0 = false
HWP capabilities = false
HWP PECI override = false
flexible HWP = false
IA32_HWP_REQUEST MSR fast access mode = false
HW_FEEDBACK MSRs supported = false
ignoring idle logical processor HWP req = false
enhanced hardware feedback interface = false
digital thermometer thresholds = 0x0 (0)
hardware coordination feedback = true
ACNT2 available = false
performance-energy bias capability = false
number of enh hardware feedback classes = 0x0 (0)
performance capability reporting = false
energy efficiency capability reporting = false
size of feedback struct (4KB pages) = 0x1 (1)
index of CPU's row in feedback struct = 0x0 (0)
extended feature flags (7):
FSGSBASE instructions = true
IA32_TSC_ADJUST MSR supported = false
SGX: Software Guard Extensions supported = false
BMI1 instructions = true
HLE hardware lock elision = false
AVX2: advanced vector extensions 2 = true
FDP_EXCPTN_ONLY = false
SMEP supervisor mode exec protection = true
BMI2 instructions = true
enhanced REP MOVSB/STOSB = true
INVPCID instruction = true
RTM: restricted transactional memory = false
RDT-CMT/PQoS cache monitoring = true
deprecated FPU CS/DS = false
MPX: intel memory protection extensions = false
RDT-CAT/PQE cache allocation = true
AVX512F: AVX-512 foundation instructions = false
AVX512DQ: double & quadword instructions = false
RDSEED instruction = true
ADX instructions = true
SMAP: supervisor mode access prevention = true
AVX512IFMA: fused multiply add = false
PCOMMIT instruction = false
CLFLUSHOPT instruction = true
CLWB instruction = true
Intel processor trace = false
AVX512PF: prefetch instructions = false
AVX512ER: exponent & reciprocal instrs = false
AVX512CD: conflict detection instrs = false
SHA instructions = true
AVX512BW: byte & word instructions = false
AVX512VL: vector length = false
PREFETCHWT1 = false
AVX512VBMI: vector byte manipulation = false
UMIP: user-mode instruction prevention = true
PKU protection keys for user-mode = true
OSPKE CR4.PKE and RDPKRU/WRPKRU = true
WAITPKG instructions = false
AVX512_VBMI2: byte VPCOMPRESS, VPEXPAND = false
CET_SS: CET shadow stack = true
GFNI: Galois Field New Instructions = false
VAES instructions = true
VPCLMULQDQ instruction = true
AVX512_VNNI: neural network instructions = false
AVX512_BITALG: bit count/shiffle = false
TME: Total Memory Encryption = false
AVX512: VPOPCNTDQ instruction = false
5-level paging = false
BNDLDX/BNDSTX MAWAU value in 64-bit mode = 0x0 (0)
RDPID: read processor D supported = true
KL: key locker = false
CLDEMOTE supports cache line demote = false
MOVDIRI instruction = false
MOVDIR64B instruction = false
ENQCMD instruction = false
SGX_LC: SGX launch config supported = false
PKS: supervisor protection keys = false
AVX512_4VNNIW: neural network instrs = false
AVX512_4FMAPS: multiply acc single prec = false
fast short REP MOV = true
UINTR: user interrupts = false
AVX512_VP2INTERSECT: intersect mask regs = false
SRBDS mitigation MSR available = false
VERW MD_CLEAR microcode support = false
SERIALIZE instruction = false
hybrid part = false
TSXLDTRK: TSX suspend load addr tracking = false
PCONFIG instruction = false
LBR: architectural last branch records = false
CET_IBT: CET indirect branch tracking = false
AMX-BF16: tile bfloat16 support = false
AVX512_FP16: fp16 support = false
AMX-TILE: tile architecture support = false
AMX-INT8: tile 8-bit integer support = false
IBRS/IBPB: indirect branch restrictions = false
STIBP: 1 thr indirect branch predictor = false
L1D_FLUSH: IA32_FLUSH_CMD MSR = false
IA32_ARCH_CAPABILITIES MSR = false
IA32_CORE_CAPABILITIES MSR = false
SSBD: speculative store bypass disable = false
Direct Cache Access Parameters (9):
PLATFORM_DCA_CAP MSR bits = 0
Architecture Performance Monitoring Features (0xa):
version ID = 0x0 (0)
number of counters per logical processor = 0x0 (0)
bit width of counter = 0x0 (0)
length of EBX bit vector = 0x0 (0)
core cycle event not available = false
instruction retired event not available = false
reference cycles event not available = false
last-level cache ref event not available = false
last-level cache miss event not avail = false
branch inst retired event not available = false
branch mispred retired event not avail = false
fixed counter 0 supported = false
fixed counter 1 supported = false
fixed counter 2 supported = false
fixed counter 3 supported = false
fixed counter 4 supported = false
fixed counter 5 supported = false
fixed counter 6 supported = false
fixed counter 7 supported = false
fixed counter 8 supported = false
fixed counter 9 supported = false
fixed counter 10 supported = false
fixed counter 11 supported = false
fixed counter 12 supported = false
fixed counter 13 supported = false
fixed counter 14 supported = false
fixed counter 15 supported = false
fixed counter 16 supported = false
fixed counter 17 supported = false
fixed counter 18 supported = false
fixed counter 19 supported = false
fixed counter 20 supported = false
fixed counter 21 supported = false
fixed counter 22 supported = false
fixed counter 23 supported = false
fixed counter 24 supported = false
fixed counter 25 supported = false
fixed counter 26 supported = false
fixed counter 27 supported = false
fixed counter 28 supported = false
fixed counter 29 supported = false
fixed counter 30 supported = false
fixed counter 31 supported = false
number of fixed counters = 0x0 (0)
bit width of fixed counters = 0x0 (0)
anythread deprecation = false
x2APIC features / processor topology (0xb):
extended APIC ID = 8
--- level 0 ---
level number = 0x0 (0)
level type = thread (1)
bit width of level = 0x1 (1)
number of logical processors at level = 0x2 (2)
--- level 1 ---
level number = 0x1 (1)
level type = core (2)
bit width of level = 0x4 (4)
number of logical processors at level = 0xc (12)
XSAVE features (0xd/0):
XCR0 lower 32 bits valid bit field mask = 0x00000207
XCR0 upper 32 bits valid bit field mask = 0x00000000
XCR0 supported: x87 state = true
XCR0 supported: SSE state = true
XCR0 supported: AVX state = true
XCR0 supported: MPX BNDREGS = false
XCR0 supported: MPX BNDCSR = false
XCR0 supported: AVX-512 opmask = false
XCR0 supported: AVX-512 ZMM_Hi256 = false
XCR0 supported: AVX-512 Hi16_ZMM = false
IA32_XSS supported: PT state = false
XCR0 supported: PKRU state = true
XCR0 supported: CET_U state = false
XCR0 supported: CET_S state = false
IA32_XSS supported: HDC state = false
IA32_XSS supported: UINTR state = false
LBR supported = false
IA32_XSS supported: HWP state = false
XTILECFG supported = false
XTILEDATA supported = false
bytes required by fields in XCR0 = 0x00000988 (2440)
bytes required by XSAVE/XRSTOR area = 0x00000988 (2440)
XSAVE features (0xd/1):
XSAVEOPT instruction = true
XSAVEC instruction = true
XGETBV instruction = true
XSAVES/XRSTORS instructions = true
XFD: extended feature disable supported = false
SAVE area size in bytes = 0x00000348 (840)
IA32_XSS lower 32 bits valid bit field mask = 0x00001800
IA32_XSS upper 32 bits valid bit field mask = 0x00000000
AVX/YMM features (0xd/2):
AVX/YMM save state byte size = 0x00000100 (256)
AVX/YMM save state byte offset = 0x00000240 (576)
supported in IA32_XSS or XCR0 = XCR0 (user state)
64-byte alignment in compacted XSAVE = false
XFD faulting supported = false
PKRU features (0xd/9):
PKRU save state byte size = 0x00000008 (8)
PKRU save state byte offset = 0x00000980 (2432)
supported in IA32_XSS or XCR0 = XCR0 (user state)
64-byte alignment in compacted XSAVE = false
XFD faulting supported = false
CET_U state features (0xd/0xb):
CET_U state save state byte size = 0x00000010 (16)
CET_U state save state byte offset = 0x00000000 (0)
supported in IA32_XSS or XCR0 = IA32_XSS (supervisor state)
64-byte alignment in compacted XSAVE = false
XFD faulting supported = false
CET_S state features (0xd/0xc):
CET_S state save state byte size = 0x00000018 (24)
CET_S state save state byte offset = 0x00000000 (0)
supported in IA32_XSS or XCR0 = IA32_XSS (supervisor state)
64-byte alignment in compacted XSAVE = false
XFD faulting supported = false
Quality of Service Monitoring Resource Type (0xf/0):
Maximum range of RMID = 255
supports L3 cache QoS monitoring = true
L3 Cache Quality of Service Monitoring (0xf/1):
Conversion factor from IA32_QM_CTR to bytes = 64
Maximum range of RMID = 255
Counter width = 24
IA32_QM_CTR bit 61 is overflow = false
supports L3 occupancy monitoring = true
supports L3 total bandwidth monitoring = true
supports L3 local bandwidth monitoring = true
Resource Director Technology Allocation (0x10/0):
L3 cache allocation technology supported = true
L2 cache allocation technology supported = false
memory bandwidth allocation supported = false
L3 Cache Allocation Technology (0x10/1):
length of capacity bit mask = 0x10 (16)
Bit-granular map of isolation/contention = 0x00000000
infrequent updates of COS = false
code and data prioritization supported = true
highest COS number supported = 0xf (15)
extended processor signature (0x80000001/eax):
family/generation = 0xf (15)
model = 0x0 (0)
stepping id = 0x0 (0)
extended family = 0xa (10)
extended model = 0x5 (5)
(family synth) = 0x19 (25)
(model synth) = 0x50 (80)
(simple synth) = AMD (unknown model) [Zen 3], 7nm
extended feature flags (0x80000001/edx):
x87 FPU on chip = true
virtual-8086 mode enhancement = true
debugging extensions = true
page size extensions = true
time stamp counter = true
RDMSR and WRMSR support = true
physical address extensions = true
machine check exception = true
CMPXCHG8B inst. = true
APIC on chip = true
SYSCALL and SYSRET instructions = true
memory type range registers = true
global paging extension = true
machine check architecture = true
conditional move/compare instruction = true
page attribute table = true
page size extension = true
multiprocessing capable = false
no-execute page protection = true
AMD multimedia instruction extensions = true
MMX Technology = true
FXSAVE/FXRSTOR = true
SSE extensions = true
1-GB large page support = true
RDTSCP = true
long mode (AA-64) = true
3DNow! instruction extensions = false
3DNow! instructions = false
extended brand id (0x80000001/ebx):
raw = 0x0 (0)
BrandId = 0x0 (0)
PkgType = 0x0 (0)
AMD feature flags (0x80000001/ecx):
LAHF/SAHF supported in 64-bit mode = true
CMP Legacy = true
SVM: secure virtual machine = true
extended APIC space = true
AltMovCr8 = true
LZCNT advanced bit manipulation = true
SSE4A support = true
misaligned SSE mode = true
3DNow! PREFETCH/PREFETCHW instructions = true
OS visible workaround = true
instruction based sampling = true
XOP support = false
SKINIT/STGI support = true
watchdog timer support = true
lightweight profiling support = false
4-operand FMA instruction = false
TCE: translation cache extension = true
NodeId MSR C001100C = false
TBM support = false
topology extensions = true
core performance counter extensions = true
NB/DF performance counter extensions = true
data breakpoint extension = true
performance time-stamp counter support = false
LLC performance counter extensions = true
MWAITX/MONITORX supported = true
Address mask extension support = true
brand = "AMD Ryzen 5 PRO 5650U with Radeon Graphics "
L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
instruction # entries = 0x40 (64)
instruction associativity = 0xff (255)
data # entries = 0x40 (64)
data associativity = 0xff (255)
L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
instruction # entries = 0x40 (64)
instruction associativity = 0xff (255)
data # entries = 0x40 (64)
data associativity = 0xff (255)
L1 data cache information (0x80000005/ecx):
line size (bytes) = 0x40 (64)
lines per tag = 0x1 (1)
associativity = 0x8 (8)
size (KB) = 0x20 (32)
L1 instruction cache information (0x80000005/edx):
line size (bytes) = 0x40 (64)
lines per tag = 0x1 (1)
associativity = 0x8 (8)
size (KB) = 0x20 (32)
L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
instruction # entries = 0x200 (512)
instruction associativity = 2-way (2)
data # entries = 0x800 (2048)
data associativity = 4-way (4)
L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
instruction # entries = 0x200 (512)
instruction associativity = 4-way (4)
data # entries = 0x800 (2048)
data associativity = 8-way (6)
L2 unified cache information (0x80000006/ecx):
line size (bytes) = 0x40 (64)
lines per tag = 0x1 (1)
associativity = 8-way (6)
size (KB) = 0x200 (512)
L3 cache information (0x80000006/edx):
line size (bytes) = 0x40 (64)
lines per tag = 0x1 (1)
associativity = 0x9 (9)
size (in 512KB units) = 0x20 (32)
RAS Capability (0x80000007/ebx):
MCA overflow recovery support = true
SUCCOR support = true
HWA: hardware assert support = false
scalable MCA support = true
Advanced Power Management Features (0x80000007/ecx):
CmpUnitPwrSampleTimeRatio = 0x0 (0)
Advanced Power Management Features (0x80000007/edx):
TS: temperature sensing diode = true
FID: frequency ID control = false
VID: voltage ID control = false
TTP: thermal trip = true
TM: thermal monitor = true
STC: software thermal control = false
100 MHz multiplier control = false
hardware P-State control = true
TscInvariant = true
CPB: core performance boost = true
read-only effective frequency interface = true
processor feedback interface = false
APM power reporting = false
connected standby = true
RAPL: running average power limit = true
Physical Address and Linear Address Size (0x80000008/eax):
maximum physical address bits = 0x30 (48)
maximum linear (virtual) address bits = 0x30 (48)
maximum guest physical address bits = 0x0 (0)
Extended Feature Extensions ID (0x80000008/ebx):
CLZERO instruction = true
instructions retired count support = true
always save/restore error pointers = true
RDPRU instruction = true
memory bandwidth enforcement = true
WBNOINVD instruction = true
IBPB: indirect branch prediction barrier = true
IBRS: indirect branch restr speculation = true
STIBP: 1 thr indirect branch predictor = true
STIBP always on preferred mode = true
ppin processor id number supported = false
SSBD: speculative store bypass disable = true
virtualized SSBD = false
SSBD fixed in hardware = false
Size Identifiers (0x80000008/ecx):
number of threads = 0xc (12)
ApicIdCoreIdSize = 0x4 (4)
performance time-stamp counter size = 0x0 (0)
Feature Extended Size (0x80000008/edx):
RDPRU instruction max input support = 0x1 (1)
SVM Secure Virtual Machine (0x8000000a/eax):
SvmRev: SVM revision = 0x1 (1)
SVM Secure Virtual Machine (0x8000000a/edx):
nested paging = true
LBR virtualization = true
SVM lock = true
NRIP save = true
MSR based TSC rate control = true
VMCB clean bits support = true
flush by ASID = true
decode assists = true
SSSE3/SSE5 opcode set disable = false
pause intercept filter = true
pause filter threshold = true
AVIC: AMD virtual interrupt controller = true
virtualized VMLOAD/VMSAVE = true
virtualized global interrupt flag (GIF) = true
GMET: guest mode execute trap = true
guest Spec_ctl support = true
INVLPGB/TLBSYNC hyperv interc enable = false
NASID: number of address space identifiers = 0x8000 (32768):
L1 TLB information: 1G pages (0x80000019/eax):
instruction # entries = 0x40 (64)
instruction associativity = full (15)
data # entries = 0x40 (64)
data associativity = full (15)
L2 TLB information: 1G pages (0x80000019/ebx):
instruction # entries = 0x0 (0)
instruction associativity = L2 off (0)
data # entries = 0x40 (64)
data associativity = full (15)
SVM Secure Virtual Machine (0x8000001a/eax):
128-bit SSE executed full-width = false
MOVU* better than MOVL*/MOVH* = true
256-bit SSE executed full-width = true
Instruction Based Sampling Identifiers (0x8000001b/eax):
IBS feature flags valid = true
IBS fetch sampling = true
IBS execution sampling = true
read write of op counter = true
op counting mode = true
branch target address reporting = true
IbsOpCurCnt and IbsOpMaxCnt extend 7 = true
invalid RIP indication support = true
fused branch micro-op indication support = true
IBS fetch control extended MSR support = true
IBS op data 4 MSR support = false
Lightweight Profiling Capabilities: Availability (0x8000001c/eax):
lightweight profiling = false
LWPVAL instruction = false
instruction retired event = false
branch retired event = false
DC miss event = false
core clocks not halted event = false
core reference clocks not halted event = false
interrupt on threshold overflow = false
Lightweight Profiling Capabilities: Supported (0x8000001c/edx):
lightweight profiling = false
LWPVAL instruction = false
instruction retired event = false
branch retired event = false
DC miss event = false
core clocks not halted event = false
core reference clocks not halted event = false
interrupt on threshold overflow = false
Lightweight Profiling Capabilities (0x8000001c/ebx):
LWPCB byte size = 0x0 (0)
event record byte size = 0x0 (0)
maximum EventId = 0x0 (0)
EventInterval1 field offset = 0x0 (0)
Lightweight Profiling Capabilities (0x8000001c/ecx):
latency counter bit size = 0x0 (0)
data cache miss address valid = false
amount cache latency is rounded = 0x0 (0)
LWP implementation version = 0x0 (0)
event ring buffer size in records = 0x0 (0)
branch prediction filtering = false
IP filtering = false
cache level filtering = false
cache latency filteing = false
Cache Properties (0x8000001d):
--- cache 0 ---
type = data (1)
level = 0x1 (1)
self-initializing = true
fully associative = false
extra cores sharing this cache = 0x1 (1)
line size in bytes = 0x40 (64)
physical line partitions = 0x1 (1)
number of ways = 0x8 (8)
number of sets = 64
write-back invalidate = false
cache inclusive of lower levels = false
(synth size) = 32768 (32 KB)
--- cache 1 ---
type = instruction (2)
level = 0x1 (1)
self-initializing = true
fully associative = false
extra cores sharing this cache = 0x1 (1)
line size in bytes = 0x40 (64)
physical line partitions = 0x1 (1)
number of ways = 0x8 (8)
number of sets = 64
write-back invalidate = false
cache inclusive of lower levels = false
(synth size) = 32768 (32 KB)
--- cache 2 ---
type = unified (3)
level = 0x2 (2)
self-initializing = true
fully associative = false
extra cores sharing this cache = 0x1 (1)
line size in bytes = 0x40 (64)
physical line partitions = 0x1 (1)
number of ways = 0x8 (8)
number of sets = 1024
write-back invalidate = false
cache inclusive of lower levels = true
(synth size) = 524288 (512 KB)
--- cache 3 ---
type = unified (3)
level = 0x3 (3)
self-initializing = true
fully associative = false
extra cores sharing this cache = 0xb (11)
line size in bytes = 0x40 (64)
physical line partitions = 0x1 (1)
number of ways = 0x10 (16)
number of sets = 16384
write-back invalidate = true
cache inclusive of lower levels = false
(synth size) = 16777216 (16 MB)
extended APIC ID = 8
Core Identifiers (0x8000001e/ebx):
core ID = 0x4 (4)
threads per core = 0x2 (2)
Node Identifiers (0x8000001e/ecx):
node ID = 0x0 (0)
nodes per processor = 0x1 (1)
AMD Secure Encryption (0x8000001f):
SME: secure memory encryption support = true
SEV: secure encrypted virtualize support = true
VM page flush MSR support = true
SEV-ES: SEV encrypted state support = true
SEV-SNP: SEV secure nested paging = false
VMPL: VM permission levels = false
hardware cache coher across enc domains = false
SEV guest exec only from 64-bit host = true
restricted injection = true
alternate injection = true
full debug state swap for SEV-ES guests = true
disallowing IBS use by host = false
encryption bit position in PTE = 0x0 (0)
physical address space width reduction = 0x0 (0)
number of VM permission levels = 0x0 (0)
number of SEV-enabled guests supported = 0x0 (0)
minimum SEV guest ASID = 0x1 (1)
PQoS Enforcement for Memory Bandwidth (0x80000020):
memory bandwidth enforcement support = true
capacity bitmask length = 0xc (12)
number of classes of service = 0xf (15)
0x80000021 0x00: eax=0x0000004d ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x80000022 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x80000023 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
(instruction supported synth):
CMPXCHG8B = true
conditional move/compare = true
PREFETCH/PREFETCHW = true
(multi-processing synth) = multi-core (c=12)
(multi-processing method) = AMD
(APIC widths synth): CORE_width=3 SMT_width=1
(APIC synth): PKG_ID=0 CORE_ID=4 SMT_ID=0
(uarch synth) = AMD Zen 3, 7nm
(synth) = AMD (unknown model) [Zen 3], 7nm
Thanks, I will check based on this information and get back.
The support for zen3 architecture is being added as part of exiting pull request #561.
Thanks for reporting the issue!
You may also consider checking out https://github.com/amd/blis.
PR #561 is basically backporting the changes from there in this repo.
Ok, thanks. Hope it gets merged soon :)
As for using forks, I am not a fan. Trying to keep up with which releases one should use is just tiresome and needless. I have suggested the amd/scalapack to create merge-requests for the scalapack repo. I would encourage AMD to do this. :)
Great thanks!
Here is the output (with success!)
$> ../configure auto
configure: detected Linux kernel version 5.10.0-9-amd64.
configure: python interpeter search list is: python python3 python2.
configure: using 'python' python interpreter.
configure: found python version 3.9.7 (maj: 3, min: 9, rev: 7).
configure: python 3.9.7 appears to be supported.
configure: C compiler search list is: gcc clang cc.
configure: using 'gcc' C compiler.
configure: C++ compiler search list is: g++ clang++ c++.
configure: using 'g++' C++ compiler (for sandbox only).
configure: found gcc version 11.2.0 (maj: 11, min: 2, rev: 0).
configure: checking for blacklisted configurations due to gcc 11.2.0.
configure: checking gcc 11.2.0 against known consequential version ranges.
configure: found assembler ('as') version 2.37 (maj: 2, min: 37, rev: ).
configure: checking for blacklisted configurations due to as 2.37.
configure: reading configuration registry...done.
configure: determining default version string.
configure: found '.git' directory; assuming git clone.
configure: executing: git describe --tags.
configure: got back 0.8.1-203-g9be97c15.
configure: truncating to 0.8.1-203.
configure: starting configuration of BLIS 0.8.1-203.
configure: configuring with official version string.
configure: found shared library .so version '4.0.0'.
configure: .so major version: 4
configure: .so minor.build version: 0.0
configure: automatic configuration requested.
configure: hardware detection driver returned 'zen3'.
configure: checking configuration against contents of 'config_registry'.
configure: configuration 'zen3' is registered.
configure: 'zen3' is defined as having the following sub-configurations:
configure: zen3
configure: which collectively require the following kernels:
configure: zen3 zen2 zen haswell
configure: checking sub-configurations:
configure: 'zen3' is registered...and exists.
configure: checking sub-configurations' requisite kernels:
configure: 'zen3' kernels...exist.
configure: 'zen2' kernels...exist.
configure: 'zen' kernels...exist.
configure: 'haswell' kernels...exist.
configure: no install prefix option given; defaulting to '/usr/local'.
configure: no install exec_prefix option given; defaulting to PREFIX.
configure: no install libdir option given; defaulting to EXECPREFIX/lib.
configure: no install includedir option given; defaulting to PREFIX/include.
configure: no install sharedir option given; defaulting to PREFIX/share.
configure: final installation directories:
configure: prefix: /usr/local
configure: exec_prefix: ${prefix}
configure: libdir: ${exec_prefix}/lib
configure: includedir: ${prefix}/include
configure: sharedir: ${prefix}/share
configure: NOTE: the variables above can be overridden when running make.
configure: no preset CFLAGS detected.
configure: no preset LDFLAGS detected.
configure: debug symbols disabled.
configure: disabling verbose make output. (enable with 'make V=1'.)
configure: disabling ARG_MAX hack.
configure: building BLIS as both static and shared libraries.
configure: exporting only public symbols within shared library.
configure: enabling operating system support.
configure: threading is disabled.
configure: requesting slab threading in jr and ir loops.
configure: internal memory pools for packing blocks are enabled.
configure: internal memory pools for small blocks are enabled.
configure: memory tracing output is disabled.
configure: libmemkind not found; disabling.
configure: compiler appears to support #pragma omp simd.
configure: the BLAS compatibility layer is enabled.
configure: the CBLAS compatibility layer is disabled.
configure: mixed datatype support is enabled.
configure: mixed datatype optimizations requiring extra memory are enabled.
configure: small matrix handling is enabled.
configure: trsm diagonal element pre-inversion is enabled.
configure: the BLIS API integer size is automatically determined.
configure: the BLAS/CBLAS API integer size is 32-bit.
configure: configuring for conventional gemm implementation.
configure: configuring complex return type as "gnu".
configure: creating ./config.mk from ../build/config.mk.in
configure: creating ./bli_config.h from ../build/bli_config.h.in
configure: creating ./obj/zen3
configure: creating ./obj/zen3/config/zen3
configure: creating ./obj/zen3/kernels/zen3
configure: creating ./obj/zen3/kernels/zen2
configure: creating ./obj/zen3/kernels/zen
configure: creating ./obj/zen3/kernels/haswell
configure: creating ./obj/zen3/ref_kernels/zen3
configure: creating ./obj/zen3/frame
configure: creating ./obj/zen3/blastest
configure: creating ./obj/zen3/testsuite
configure: creating ./lib/zen3
configure: creating ./include/zen3
configure: mirroring ../config/zen3 to ./obj/zen3/config/zen3
configure: mirroring ../kernels/zen3 to ./obj/zen3/kernels/zen3
configure: mirroring ../kernels/zen2 to ./obj/zen3/kernels/zen2
configure: mirroring ../kernels/zen to ./obj/zen3/kernels/zen
configure: mirroring ../kernels/haswell to ./obj/zen3/kernels/haswell
configure: mirroring ../ref_kernels to ./obj/zen3/ref_kernels
configure: mirroring ../ref_kernels to ./obj/zen3/ref_kernels/zen3
configure: mirroring ../frame to ./obj/zen3/frame
configure: creating makefile fragments in ./obj/zen3/config/zen3
configure: creating makefile fragments in ./obj/zen3/kernels/zen3
configure: creating makefile fragments in ./obj/zen3/kernels/zen2
configure: creating makefile fragments in ./obj/zen3/kernels/zen
configure: creating makefile fragments in ./obj/zen3/kernels/haswell
configure: creating makefile fragments in ./obj/zen3/ref_kernels
configure: creating makefile fragments in ./obj/zen3/frame
configure: symbolic link to Makefile already exists; forcing creation of new link.
configure: symbolic link to blis.pc.in already exists; forcing creation of new link.
configure: symbolic link to common.mk already exists; forcing creation of new link.
configure: symbolic link to 'config' directory already exists; forcing creation of new link.
configure: configured to build outside of source distribution.
@fgvanzee I'll let you close in case you wanted some follow up questions, but to me this seems resolved! :)
Great news, @zerothi. Thanks for that update.