flame / blis

BLAS-like Library Instantiation Software Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

auto not detecting correct architecture for ryzen 5

zerothi opened this issue · comments

I have this:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          12
On-line CPU(s) list:             0-11
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           80
Model name:                      AMD Ryzen 5 PRO 5650U with Radeon Graphics
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1330.581
CPU max MHz:                     4806.6401
CPU min MHz:                     1600.0000
BogoMIPS:                        4591.45
Virtualization:                  AMD-V
L1d cache:                       192 KiB
L1i cache:                       192 KiB
L2 cache:                        3 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-11
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm

I got this:

./configure -p ... -t no --enable-blas --enable-cblas auto
...
*** Unable to automatically detect hardware type! ***

if you need anything, please let me know!

Can you please share output of one of the following command on this system?

dmidecode -t processor
cpuid --one-cpu

Output of cpuid --one-cpu:

CPU:
   vendor_id = "AuthenticAMD"
   version information (1/eax):
      processor type  = primary processor (0)
      family          = 0xf (15)
      model           = 0x0 (0)
      stepping id     = 0x0 (0)
      extended family = 0xa (10)
      extended model  = 0x5 (5)
      (family synth)  = 0x19 (25)
      (model synth)   = 0x50 (80)
      (simple synth)  = AMD (unknown model) [Zen 3], 7nm
   miscellaneous (1/ebx):
      process local APIC physical ID = 0x8 (8)
      maximum IDs for CPUs in pkg    = 0xc (12)
      CLFLUSH line size              = 0x8 (8)
      brand index                    = 0x0 (0)
   brand id = 0x00 (0): unknown
   feature information (1/edx):
      x87 FPU on chip                        = true
      VME: virtual-8086 mode enhancement     = true
      DE: debugging extensions               = true
      PSE: page size extensions              = true
      TSC: time stamp counter                = true
      RDMSR and WRMSR support                = true
      PAE: physical address extensions       = true
      MCE: machine check exception           = true
      CMPXCHG8B inst.                        = true
      APIC on chip                           = true
      SYSENTER and SYSEXIT                   = true
      MTRR: memory type range registers      = true
      PTE global bit                         = true
      MCA: machine check architecture        = true
      CMOV: conditional move/compare instr   = true
      PAT: page attribute table              = true
      PSE-36: page size extension            = true
      PSN: processor serial number           = false
      CLFLUSH instruction                    = true
      DS: debug store                        = false
      ACPI: thermal monitor and clock ctrl   = false
      MMX Technology                         = true
      FXSAVE/FXRSTOR                         = true
      SSE extensions                         = true
      SSE2 extensions                        = true
      SS: self snoop                         = false
      hyper-threading / multi-core supported = true
      TM: therm. monitor                     = false
      IA64                                   = false
      PBE: pending break event               = false
   feature information (1/ecx):
      PNI/SSE3: Prescott New Instructions     = true
      PCLMULDQ instruction                    = true
      DTES64: 64-bit debug store              = false
      MONITOR/MWAIT                           = true
      CPL-qualified debug store               = false
      VMX: virtual machine extensions         = false
      SMX: safer mode extensions              = false
      Enhanced Intel SpeedStep Technology     = false
      TM2: thermal monitor 2                  = false
      SSSE3 extensions                        = true
      context ID: adaptive or shared L1 data  = false
      SDBG: IA32_DEBUG_INTERFACE              = false
      FMA instruction                         = true
      CMPXCHG16B instruction                  = true
      xTPR disable                            = false
      PDCM: perfmon and debug                 = false
      PCID: process context identifiers       = false
      DCA: direct cache access                = false
      SSE4.1 extensions                       = true
      SSE4.2 extensions                       = true
      x2APIC: extended xAPIC support          = false
      MOVBE instruction                       = true
      POPCNT instruction                      = true
      time stamp counter deadline             = false
      AES instruction                         = true
      XSAVE/XSTOR states                      = true
      OS-enabled XSAVE/XSTOR                  = true
      AVX: advanced vector extensions         = true
      F16C half-precision convert instruction = true
      RDRAND instruction                      = true
      hypervisor guest status                 = false
   cache and TLB information (2):
   processor serial number = 00A5-0F00-0000-0000-0000-0000
   MONITOR/MWAIT (5):
      smallest monitor-line size (bytes)       = 0x40 (64)
      largest monitor-line size (bytes)        = 0x40 (64)
      enum of Monitor-MWAIT exts supported     = true
      supports intrs as break-event for MWAIT  = true
      number of C0 sub C-states using MWAIT    = 0x1 (1)
      number of C1 sub C-states using MWAIT    = 0x1 (1)
      number of C2 sub C-states using MWAIT    = 0x0 (0)
      number of C3 sub C-states using MWAIT    = 0x0 (0)
      number of C4 sub C-states using MWAIT    = 0x0 (0)
      number of C5 sub C-states using MWAIT    = 0x0 (0)
      number of C6 sub C-states using MWAIT    = 0x0 (0)
      number of C7 sub C-states using MWAIT    = 0x0 (0)
   Thermal and Power Management Features (6):
      digital thermometer                     = false
      Intel Turbo Boost Technology            = false
      ARAT always running APIC timer          = true
      PLN power limit notification            = false
      ECMD extended clock modulation duty     = false
      PTM package thermal management          = false
      HWP base registers                      = false
      HWP notification                        = false
      HWP activity window                     = false
      HWP energy performance preference       = false
      HWP package level request               = false
      HDC base registers                      = false
      Intel Turbo Boost Max Technology 3.0    = false
      HWP capabilities                        = false
      HWP PECI override                       = false
      flexible HWP                            = false
      IA32_HWP_REQUEST MSR fast access mode   = false
      HW_FEEDBACK MSRs supported              = false
      ignoring idle logical processor HWP req = false
      enhanced hardware feedback interface    = false
      digital thermometer thresholds          = 0x0 (0)
      hardware coordination feedback          = true
      ACNT2 available                         = false
      performance-energy bias capability      = false
      number of enh hardware feedback classes = 0x0 (0)
      performance capability reporting        = false
      energy efficiency capability reporting  = false
      size of feedback struct (4KB pages)     = 0x1 (1)
      index of CPU's row in feedback struct   = 0x0 (0)
   extended feature flags (7):
      FSGSBASE instructions                    = true
      IA32_TSC_ADJUST MSR supported            = false
      SGX: Software Guard Extensions supported = false
      BMI1 instructions                        = true
      HLE hardware lock elision                = false
      AVX2: advanced vector extensions 2       = true
      FDP_EXCPTN_ONLY                          = false
      SMEP supervisor mode exec protection     = true
      BMI2 instructions                        = true
      enhanced REP MOVSB/STOSB                 = true
      INVPCID instruction                      = true
      RTM: restricted transactional memory     = false
      RDT-CMT/PQoS cache monitoring            = true
      deprecated FPU CS/DS                     = false
      MPX: intel memory protection extensions  = false
      RDT-CAT/PQE cache allocation             = true
      AVX512F: AVX-512 foundation instructions = false
      AVX512DQ: double & quadword instructions = false
      RDSEED instruction                       = true
      ADX instructions                         = true
      SMAP: supervisor mode access prevention  = true
      AVX512IFMA: fused multiply add           = false
      PCOMMIT instruction                      = false
      CLFLUSHOPT instruction                   = true
      CLWB instruction                         = true
      Intel processor trace                    = false
      AVX512PF: prefetch instructions          = false
      AVX512ER: exponent & reciprocal instrs   = false
      AVX512CD: conflict detection instrs      = false
      SHA instructions                         = true
      AVX512BW: byte & word instructions       = false
      AVX512VL: vector length                  = false
      PREFETCHWT1                              = false
      AVX512VBMI: vector byte manipulation     = false
      UMIP: user-mode instruction prevention   = true
      PKU protection keys for user-mode        = true
      OSPKE CR4.PKE and RDPKRU/WRPKRU          = true
      WAITPKG instructions                     = false
      AVX512_VBMI2: byte VPCOMPRESS, VPEXPAND  = false
      CET_SS: CET shadow stack                 = true
      GFNI: Galois Field New Instructions      = false
      VAES instructions                        = true
      VPCLMULQDQ instruction                   = true
      AVX512_VNNI: neural network instructions = false
      AVX512_BITALG: bit count/shiffle         = false
      TME: Total Memory Encryption             = false
      AVX512: VPOPCNTDQ instruction            = false
      5-level paging                           = false
      BNDLDX/BNDSTX MAWAU value in 64-bit mode = 0x0 (0)
      RDPID: read processor D supported        = true
      KL: key locker                           = false
      CLDEMOTE supports cache line demote      = false
      MOVDIRI instruction                      = false
      MOVDIR64B instruction                    = false
      ENQCMD instruction                       = false
      SGX_LC: SGX launch config supported      = false
      PKS: supervisor protection keys          = false
      AVX512_4VNNIW: neural network instrs     = false
      AVX512_4FMAPS: multiply acc single prec  = false
      fast short REP MOV                       = true
      UINTR: user interrupts                   = false
      AVX512_VP2INTERSECT: intersect mask regs = false
      SRBDS mitigation MSR available           = false
      VERW MD_CLEAR microcode support          = false
      SERIALIZE instruction                    = false
      hybrid part                              = false
      TSXLDTRK: TSX suspend load addr tracking = false
      PCONFIG instruction                      = false
      LBR: architectural last branch records   = false
      CET_IBT: CET indirect branch tracking    = false
      AMX-BF16: tile bfloat16 support          = false
      AVX512_FP16: fp16 support                = false
      AMX-TILE: tile architecture support      = false
      AMX-INT8: tile 8-bit integer support     = false
      IBRS/IBPB: indirect branch restrictions  = false
      STIBP: 1 thr indirect branch predictor   = false
      L1D_FLUSH: IA32_FLUSH_CMD MSR            = false
      IA32_ARCH_CAPABILITIES MSR               = false
      IA32_CORE_CAPABILITIES MSR               = false
      SSBD: speculative store bypass disable   = false
   Direct Cache Access Parameters (9):
      PLATFORM_DCA_CAP MSR bits = 0
   Architecture Performance Monitoring Features (0xa):
      version ID                               = 0x0 (0)
      number of counters per logical processor = 0x0 (0)
      bit width of counter                     = 0x0 (0)
      length of EBX bit vector                 = 0x0 (0)
      core cycle event not available           = false
      instruction retired event not available  = false
      reference cycles event not available     = false
      last-level cache ref event not available = false
      last-level cache miss event not avail    = false
      branch inst retired event not available  = false
      branch mispred retired event not avail   = false
      fixed counter  0 supported               = false
      fixed counter  1 supported               = false
      fixed counter  2 supported               = false
      fixed counter  3 supported               = false
      fixed counter  4 supported               = false
      fixed counter  5 supported               = false
      fixed counter  6 supported               = false
      fixed counter  7 supported               = false
      fixed counter  8 supported               = false
      fixed counter  9 supported               = false
      fixed counter 10 supported               = false
      fixed counter 11 supported               = false
      fixed counter 12 supported               = false
      fixed counter 13 supported               = false
      fixed counter 14 supported               = false
      fixed counter 15 supported               = false
      fixed counter 16 supported               = false
      fixed counter 17 supported               = false
      fixed counter 18 supported               = false
      fixed counter 19 supported               = false
      fixed counter 20 supported               = false
      fixed counter 21 supported               = false
      fixed counter 22 supported               = false
      fixed counter 23 supported               = false
      fixed counter 24 supported               = false
      fixed counter 25 supported               = false
      fixed counter 26 supported               = false
      fixed counter 27 supported               = false
      fixed counter 28 supported               = false
      fixed counter 29 supported               = false
      fixed counter 30 supported               = false
      fixed counter 31 supported               = false
      number of fixed counters                 = 0x0 (0)
      bit width of fixed counters              = 0x0 (0)
      anythread deprecation                    = false
   x2APIC features / processor topology (0xb):
      extended APIC ID                      = 8
      --- level 0 ---
      level number                          = 0x0 (0)
      level type                            = thread (1)
      bit width of level                    = 0x1 (1)
      number of logical processors at level = 0x2 (2)
      --- level 1 ---
      level number                          = 0x1 (1)
      level type                            = core (2)
      bit width of level                    = 0x4 (4)
      number of logical processors at level = 0xc (12)
   XSAVE features (0xd/0):
      XCR0 lower 32 bits valid bit field mask = 0x00000207
      XCR0 upper 32 bits valid bit field mask = 0x00000000
         XCR0 supported: x87 state            = true
         XCR0 supported: SSE state            = true
         XCR0 supported: AVX state            = true
         XCR0 supported: MPX BNDREGS          = false
         XCR0 supported: MPX BNDCSR           = false
         XCR0 supported: AVX-512 opmask       = false
         XCR0 supported: AVX-512 ZMM_Hi256    = false
         XCR0 supported: AVX-512 Hi16_ZMM     = false
         IA32_XSS supported: PT state         = false
         XCR0 supported: PKRU state           = true
         XCR0 supported: CET_U state          = false
         XCR0 supported: CET_S state          = false
         IA32_XSS supported: HDC state        = false
         IA32_XSS supported: UINTR state      = false
         LBR supported                        = false
         IA32_XSS supported: HWP state        = false
         XTILECFG supported                   = false
         XTILEDATA supported                  = false
      bytes required by fields in XCR0        = 0x00000988 (2440)
      bytes required by XSAVE/XRSTOR area     = 0x00000988 (2440)
   XSAVE features (0xd/1):
      XSAVEOPT instruction                        = true
      XSAVEC instruction                          = true
      XGETBV instruction                          = true
      XSAVES/XRSTORS instructions                 = true
      XFD: extended feature disable supported     = false
      SAVE area size in bytes                     = 0x00000348 (840)
      IA32_XSS lower 32 bits valid bit field mask = 0x00001800
      IA32_XSS upper 32 bits valid bit field mask = 0x00000000
   AVX/YMM features (0xd/2):
      AVX/YMM save state byte size             = 0x00000100 (256)
      AVX/YMM save state byte offset           = 0x00000240 (576)
      supported in IA32_XSS or XCR0            = XCR0 (user state)
      64-byte alignment in compacted XSAVE     = false
      XFD faulting supported                   = false
   PKRU features (0xd/9):
      PKRU save state byte size                = 0x00000008 (8)
      PKRU save state byte offset              = 0x00000980 (2432)
      supported in IA32_XSS or XCR0            = XCR0 (user state)
      64-byte alignment in compacted XSAVE     = false
      XFD faulting supported                   = false
   CET_U state features (0xd/0xb):
      CET_U state save state byte size         = 0x00000010 (16)
      CET_U state save state byte offset       = 0x00000000 (0)
      supported in IA32_XSS or XCR0            = IA32_XSS (supervisor state)
      64-byte alignment in compacted XSAVE     = false
      XFD faulting supported                   = false
   CET_S state features (0xd/0xc):
      CET_S state save state byte size         = 0x00000018 (24)
      CET_S state save state byte offset       = 0x00000000 (0)
      supported in IA32_XSS or XCR0            = IA32_XSS (supervisor state)
      64-byte alignment in compacted XSAVE     = false
      XFD faulting supported                   = false
   Quality of Service Monitoring Resource Type (0xf/0):
      Maximum range of RMID = 255
      supports L3 cache QoS monitoring = true
   L3 Cache Quality of Service Monitoring (0xf/1):
      Conversion factor from IA32_QM_CTR to bytes = 64
      Maximum range of RMID                       = 255
      Counter width                               = 24
      IA32_QM_CTR bit 61 is overflow              = false
      supports L3 occupancy monitoring            = true
      supports L3 total bandwidth monitoring      = true
      supports L3 local bandwidth monitoring      = true
   Resource Director Technology Allocation (0x10/0):
      L3 cache allocation technology supported = true
      L2 cache allocation technology supported = false
      memory bandwidth allocation supported    = false
   L3 Cache Allocation Technology (0x10/1):
      length of capacity bit mask              = 0x10 (16)
      Bit-granular map of isolation/contention = 0x00000000
      infrequent updates of COS                = false
      code and data prioritization supported   = true
      highest COS number supported             = 0xf (15)
   extended processor signature (0x80000001/eax):
      family/generation = 0xf (15)
      model           = 0x0 (0)
      stepping id     = 0x0 (0)
      extended family = 0xa (10)
      extended model  = 0x5 (5)
      (family synth)  = 0x19 (25)
      (model synth)   = 0x50 (80)
      (simple synth)  = AMD (unknown model) [Zen 3], 7nm
   extended feature flags (0x80000001/edx):
      x87 FPU on chip                       = true
      virtual-8086 mode enhancement         = true
      debugging extensions                  = true
      page size extensions                  = true
      time stamp counter                    = true
      RDMSR and WRMSR support               = true
      physical address extensions           = true
      machine check exception               = true
      CMPXCHG8B inst.                       = true
      APIC on chip                          = true
      SYSCALL and SYSRET instructions       = true
      memory type range registers           = true
      global paging extension               = true
      machine check architecture            = true
      conditional move/compare instruction  = true
      page attribute table                  = true
      page size extension                   = true
      multiprocessing capable               = false
      no-execute page protection            = true
      AMD multimedia instruction extensions = true
      MMX Technology                        = true
      FXSAVE/FXRSTOR                        = true
      SSE extensions                        = true
      1-GB large page support               = true
      RDTSCP                                = true
      long mode (AA-64)                     = true
      3DNow! instruction extensions         = false
      3DNow! instructions                   = false
   extended brand id (0x80000001/ebx):
      raw     = 0x0 (0)
      BrandId = 0x0 (0)
      PkgType = 0x0 (0)
   AMD feature flags (0x80000001/ecx):
      LAHF/SAHF supported in 64-bit mode     = true
      CMP Legacy                             = true
      SVM: secure virtual machine            = true
      extended APIC space                    = true
      AltMovCr8                              = true
      LZCNT advanced bit manipulation        = true
      SSE4A support                          = true
      misaligned SSE mode                    = true
      3DNow! PREFETCH/PREFETCHW instructions = true
      OS visible workaround                  = true
      instruction based sampling             = true
      XOP support                            = false
      SKINIT/STGI support                    = true
      watchdog timer support                 = true
      lightweight profiling support          = false
      4-operand FMA instruction              = false
      TCE: translation cache extension       = true
      NodeId MSR C001100C                    = false
      TBM support                            = false
      topology extensions                    = true
      core performance counter extensions    = true
      NB/DF performance counter extensions   = true
      data breakpoint extension              = true
      performance time-stamp counter support = false
      LLC performance counter extensions     = true
      MWAITX/MONITORX supported              = true
      Address mask extension support         = true
   brand = "AMD Ryzen 5 PRO 5650U with Radeon Graphics     "
   L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
      instruction # entries     = 0x40 (64)
      instruction associativity = 0xff (255)
      data # entries            = 0x40 (64)
      data associativity        = 0xff (255)
   L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
      instruction # entries     = 0x40 (64)
      instruction associativity = 0xff (255)
      data # entries            = 0x40 (64)
      data associativity        = 0xff (255)
   L1 data cache information (0x80000005/ecx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x1 (1)
      associativity     = 0x8 (8)
      size (KB)         = 0x20 (32)
   L1 instruction cache information (0x80000005/edx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x1 (1)
      associativity     = 0x8 (8)
      size (KB)         = 0x20 (32)
   L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
      instruction # entries     = 0x200 (512)
      instruction associativity = 2-way (2)
      data # entries            = 0x800 (2048)
      data associativity        = 4-way (4)
   L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
      instruction # entries     = 0x200 (512)
      instruction associativity = 4-way (4)
      data # entries            = 0x800 (2048)
      data associativity        = 8-way (6)
   L2 unified cache information (0x80000006/ecx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x1 (1)
      associativity     = 8-way (6)
      size (KB)         = 0x200 (512)
   L3 cache information (0x80000006/edx):
      line size (bytes)     = 0x40 (64)
      lines per tag         = 0x1 (1)
      associativity         = 0x9 (9)
      size (in 512KB units) = 0x20 (32)
   RAS Capability (0x80000007/ebx):
      MCA overflow recovery support = true
      SUCCOR support                = true
      HWA: hardware assert support  = false
      scalable MCA support          = true
   Advanced Power Management Features (0x80000007/ecx):
      CmpUnitPwrSampleTimeRatio = 0x0 (0)
   Advanced Power Management Features (0x80000007/edx):
      TS: temperature sensing diode           = true
      FID: frequency ID control               = false
      VID: voltage ID control                 = false
      TTP: thermal trip                       = true
      TM: thermal monitor                     = true
      STC: software thermal control           = false
      100 MHz multiplier control              = false
      hardware P-State control                = true
      TscInvariant                            = true
      CPB: core performance boost             = true
      read-only effective frequency interface = true
      processor feedback interface            = false
      APM power reporting                     = false
      connected standby                       = true
      RAPL: running average power limit       = true
   Physical Address and Linear Address Size (0x80000008/eax):
      maximum physical address bits         = 0x30 (48)
      maximum linear (virtual) address bits = 0x30 (48)
      maximum guest physical address bits   = 0x0 (0)
   Extended Feature Extensions ID (0x80000008/ebx):
      CLZERO instruction                       = true
      instructions retired count support       = true
      always save/restore error pointers       = true
      RDPRU instruction                        = true
      memory bandwidth enforcement             = true
      WBNOINVD instruction                     = true
      IBPB: indirect branch prediction barrier = true
      IBRS: indirect branch restr speculation  = true
      STIBP: 1 thr indirect branch predictor   = true
      STIBP always on preferred mode           = true
      ppin processor id number supported       = false
      SSBD: speculative store bypass disable   = true
      virtualized SSBD                         = false
      SSBD fixed in hardware                   = false
   Size Identifiers (0x80000008/ecx):
      number of threads                   = 0xc (12)
      ApicIdCoreIdSize                    = 0x4 (4)
      performance time-stamp counter size = 0x0 (0)
   Feature Extended Size (0x80000008/edx):
      RDPRU instruction max input support = 0x1 (1)
   SVM Secure Virtual Machine (0x8000000a/eax):
      SvmRev: SVM revision = 0x1 (1)
   SVM Secure Virtual Machine (0x8000000a/edx):
      nested paging                           = true
      LBR virtualization                      = true
      SVM lock                                = true
      NRIP save                               = true
      MSR based TSC rate control              = true
      VMCB clean bits support                 = true
      flush by ASID                           = true
      decode assists                          = true
      SSSE3/SSE5 opcode set disable           = false
      pause intercept filter                  = true
      pause filter threshold                  = true
      AVIC: AMD virtual interrupt controller  = true
      virtualized VMLOAD/VMSAVE               = true
      virtualized global interrupt flag (GIF) = true
      GMET: guest mode execute trap           = true
      guest Spec_ctl support                  = true
      INVLPGB/TLBSYNC hyperv interc enable    = false
   NASID: number of address space identifiers = 0x8000 (32768):
   L1 TLB information: 1G pages (0x80000019/eax):
      instruction # entries     = 0x40 (64)
      instruction associativity = full (15)
      data # entries            = 0x40 (64)
      data associativity        = full (15)
   L2 TLB information: 1G pages (0x80000019/ebx):
      instruction # entries     = 0x0 (0)
      instruction associativity = L2 off (0)
      data # entries            = 0x40 (64)
      data associativity        = full (15)
   SVM Secure Virtual Machine (0x8000001a/eax):
      128-bit SSE executed full-width = false
      MOVU* better than MOVL*/MOVH*   = true
      256-bit SSE executed full-width = true
   Instruction Based Sampling Identifiers (0x8000001b/eax):
      IBS feature flags valid                  = true
      IBS fetch sampling                       = true
      IBS execution sampling                   = true
      read write of op counter                 = true
      op counting mode                         = true
      branch target address reporting          = true
      IbsOpCurCnt and IbsOpMaxCnt extend 7     = true
      invalid RIP indication support           = true
      fused branch micro-op indication support = true
      IBS fetch control extended MSR support   = true
      IBS op data 4 MSR support                = false
   Lightweight Profiling Capabilities: Availability (0x8000001c/eax):
      lightweight profiling                  = false
      LWPVAL instruction                     = false
      instruction retired event              = false
      branch retired event                   = false
      DC miss event                          = false
      core clocks not halted event           = false
      core reference clocks not halted event = false
      interrupt on threshold overflow        = false
   Lightweight Profiling Capabilities: Supported (0x8000001c/edx):
      lightweight profiling                  = false
      LWPVAL instruction                     = false
      instruction retired event              = false
      branch retired event                   = false
      DC miss event                          = false
      core clocks not halted event           = false
      core reference clocks not halted event = false
      interrupt on threshold overflow        = false
   Lightweight Profiling Capabilities (0x8000001c/ebx):
      LWPCB byte size             = 0x0 (0)
      event record byte size      = 0x0 (0)
      maximum EventId             = 0x0 (0)
      EventInterval1 field offset = 0x0 (0)
   Lightweight Profiling Capabilities (0x8000001c/ecx):
      latency counter bit size          = 0x0 (0)
      data cache miss address valid     = false
      amount cache latency is rounded   = 0x0 (0)
      LWP implementation version        = 0x0 (0)
      event ring buffer size in records = 0x0 (0)
      branch prediction filtering       = false
      IP filtering                      = false
      cache level filtering             = false
      cache latency filteing            = false
   Cache Properties (0x8000001d):
      --- cache 0 ---
      type                            = data (1)
      level                           = 0x1 (1)
      self-initializing               = true
      fully associative               = false
      extra cores sharing this cache  = 0x1 (1)
      line size in bytes              = 0x40 (64)
      physical line partitions        = 0x1 (1)
      number of ways                  = 0x8 (8)
      number of sets                  = 64
      write-back invalidate           = false
      cache inclusive of lower levels = false
      (synth size)                    = 32768 (32 KB)
      --- cache 1 ---
      type                            = instruction (2)
      level                           = 0x1 (1)
      self-initializing               = true
      fully associative               = false
      extra cores sharing this cache  = 0x1 (1)
      line size in bytes              = 0x40 (64)
      physical line partitions        = 0x1 (1)
      number of ways                  = 0x8 (8)
      number of sets                  = 64
      write-back invalidate           = false
      cache inclusive of lower levels = false
      (synth size)                    = 32768 (32 KB)
      --- cache 2 ---
      type                            = unified (3)
      level                           = 0x2 (2)
      self-initializing               = true
      fully associative               = false
      extra cores sharing this cache  = 0x1 (1)
      line size in bytes              = 0x40 (64)
      physical line partitions        = 0x1 (1)
      number of ways                  = 0x8 (8)
      number of sets                  = 1024
      write-back invalidate           = false
      cache inclusive of lower levels = true
      (synth size)                    = 524288 (512 KB)
      --- cache 3 ---
      type                            = unified (3)
      level                           = 0x3 (3)
      self-initializing               = true
      fully associative               = false
      extra cores sharing this cache  = 0xb (11)
      line size in bytes              = 0x40 (64)
      physical line partitions        = 0x1 (1)
      number of ways                  = 0x10 (16)
      number of sets                  = 16384
      write-back invalidate           = true
      cache inclusive of lower levels = false
      (synth size)                    = 16777216 (16 MB)
   extended APIC ID = 8
   Core Identifiers (0x8000001e/ebx):
      core ID          = 0x4 (4)
      threads per core = 0x2 (2)
   Node Identifiers (0x8000001e/ecx):
      node ID             = 0x0 (0)
      nodes per processor = 0x1 (1)
   AMD Secure Encryption (0x8000001f):
      SME: secure memory encryption support    = true
      SEV: secure encrypted virtualize support = true
      VM page flush MSR support                = true
      SEV-ES: SEV encrypted state support      = true
      SEV-SNP: SEV secure nested paging        = false
      VMPL: VM permission levels               = false
      hardware cache coher across enc domains  = false
      SEV guest exec only from 64-bit host     = true
      restricted injection                     = true
      alternate injection                      = true
      full debug state swap for SEV-ES guests  = true
      disallowing IBS use by host              = false
      encryption bit position in PTE           = 0x0 (0)
      physical address space width reduction   = 0x0 (0)
      number of VM permission levels           = 0x0 (0)
      number of SEV-enabled guests supported   = 0x0 (0)
      minimum SEV guest ASID                   = 0x1 (1)
   PQoS Enforcement for Memory Bandwidth (0x80000020):
      memory bandwidth enforcement support = true
      capacity bitmask length              = 0xc (12)
      number of classes of service         = 0xf (15)
   0x80000021 0x00: eax=0x0000004d ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000022 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   0x80000023 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
   (instruction supported synth):
      CMPXCHG8B                = true
      conditional move/compare = true
      PREFETCH/PREFETCHW       = true
   (multi-processing synth) = multi-core (c=12)
   (multi-processing method) = AMD
   (APIC widths synth): CORE_width=3 SMT_width=1
   (APIC synth): PKG_ID=0 CORE_ID=4 SMT_ID=0
   (uarch synth) = AMD Zen 3, 7nm
   (synth) = AMD (unknown model) [Zen 3], 7nm

Thanks, I will check based on this information and get back.

The support for zen3 architecture is being added as part of exiting pull request #561.

Thanks for reporting the issue!

You may also consider checking out https://github.com/amd/blis.
PR #561 is basically backporting the changes from there in this repo.

Ok, thanks. Hope it gets merged soon :)

As for using forks, I am not a fan. Trying to keep up with which releases one should use is just tiresome and needless. I have suggested the amd/scalapack to create merge-requests for the scalapack repo. I would encourage AMD to do this. :)

@zerothi Could you give this another try using the latest commit on the master branch? We recently merged PR #561, which includes Zen3 support and related improvements.

Great thanks!

Here is the output (with success!)

$> ../configure auto
configure: detected Linux kernel version 5.10.0-9-amd64.
configure: python interpeter search list is: python python3 python2.
configure: using 'python' python interpreter.
configure: found python version 3.9.7 (maj: 3, min: 9, rev: 7).
configure: python 3.9.7 appears to be supported.
configure: C compiler search list is: gcc clang cc.
configure: using 'gcc' C compiler.
configure: C++ compiler search list is: g++ clang++ c++.
configure: using 'g++' C++ compiler (for sandbox only).
configure: found gcc version 11.2.0 (maj: 11, min: 2, rev: 0).
configure: checking for blacklisted configurations due to gcc 11.2.0.
configure: checking gcc 11.2.0 against known consequential version ranges.
configure: found assembler ('as') version 2.37 (maj: 2, min: 37, rev: ).
configure: checking for blacklisted configurations due to as 2.37.
configure: reading configuration registry...done.
configure: determining default version string.
configure: found '.git' directory; assuming git clone.
configure: executing: git describe --tags.
configure: got back 0.8.1-203-g9be97c15.
configure: truncating to 0.8.1-203.
configure: starting configuration of BLIS 0.8.1-203.
configure: configuring with official version string.
configure: found shared library .so version '4.0.0'.
configure:   .so major version: 4
configure:   .so minor.build version: 0.0
configure: automatic configuration requested.
configure: hardware detection driver returned 'zen3'.
configure: checking configuration against contents of 'config_registry'.
configure: configuration 'zen3' is registered.
configure: 'zen3' is defined as having the following sub-configurations:
configure:    zen3
configure: which collectively require the following kernels:
configure:    zen3 zen2 zen haswell
configure: checking sub-configurations:
configure:   'zen3' is registered...and exists.
configure: checking sub-configurations' requisite kernels:
configure:   'zen3' kernels...exist.
configure:   'zen2' kernels...exist.
configure:   'zen' kernels...exist.
configure:   'haswell' kernels...exist.
configure: no install prefix option given; defaulting to '/usr/local'.
configure: no install exec_prefix option given; defaulting to PREFIX.
configure: no install libdir option given; defaulting to EXECPREFIX/lib.
configure: no install includedir option given; defaulting to PREFIX/include.
configure: no install sharedir option given; defaulting to PREFIX/share.
configure: final installation directories:
configure:   prefix:      /usr/local
configure:   exec_prefix: ${prefix}
configure:   libdir:      ${exec_prefix}/lib
configure:   includedir:  ${prefix}/include
configure:   sharedir:    ${prefix}/share
configure: NOTE: the variables above can be overridden when running make.
configure: no preset CFLAGS detected.
configure: no preset LDFLAGS detected.
configure: debug symbols disabled.
configure: disabling verbose make output. (enable with 'make V=1'.)
configure: disabling ARG_MAX hack.
configure: building BLIS as both static and shared libraries.
configure: exporting only public symbols within shared library.
configure: enabling operating system support.
configure: threading is disabled.
configure: requesting slab threading in jr and ir loops.
configure: internal memory pools for packing blocks are enabled.
configure: internal memory pools for small blocks are enabled.
configure: memory tracing output is disabled.
configure: libmemkind not found; disabling.
configure: compiler appears to support #pragma omp simd.
configure: the BLAS compatibility layer is enabled.
configure: the CBLAS compatibility layer is disabled.
configure: mixed datatype support is enabled.
configure: mixed datatype optimizations requiring extra memory are enabled.
configure: small matrix handling is enabled.
configure: trsm diagonal element pre-inversion is enabled.
configure: the BLIS API integer size is automatically determined.
configure: the BLAS/CBLAS API integer size is 32-bit.
configure: configuring for conventional gemm implementation.
configure: configuring complex return type as "gnu".
configure: creating ./config.mk from ../build/config.mk.in
configure: creating ./bli_config.h from ../build/bli_config.h.in
configure: creating ./obj/zen3
configure: creating ./obj/zen3/config/zen3
configure: creating ./obj/zen3/kernels/zen3
configure: creating ./obj/zen3/kernels/zen2
configure: creating ./obj/zen3/kernels/zen
configure: creating ./obj/zen3/kernels/haswell
configure: creating ./obj/zen3/ref_kernels/zen3
configure: creating ./obj/zen3/frame
configure: creating ./obj/zen3/blastest
configure: creating ./obj/zen3/testsuite
configure: creating ./lib/zen3
configure: creating ./include/zen3
configure: mirroring ../config/zen3 to ./obj/zen3/config/zen3
configure: mirroring ../kernels/zen3 to ./obj/zen3/kernels/zen3
configure: mirroring ../kernels/zen2 to ./obj/zen3/kernels/zen2
configure: mirroring ../kernels/zen to ./obj/zen3/kernels/zen
configure: mirroring ../kernels/haswell to ./obj/zen3/kernels/haswell
configure: mirroring ../ref_kernels to ./obj/zen3/ref_kernels
configure: mirroring ../ref_kernels to ./obj/zen3/ref_kernels/zen3
configure: mirroring ../frame to ./obj/zen3/frame
configure: creating makefile fragments in ./obj/zen3/config/zen3
configure: creating makefile fragments in ./obj/zen3/kernels/zen3
configure: creating makefile fragments in ./obj/zen3/kernels/zen2
configure: creating makefile fragments in ./obj/zen3/kernels/zen
configure: creating makefile fragments in ./obj/zen3/kernels/haswell
configure: creating makefile fragments in ./obj/zen3/ref_kernels
configure: creating makefile fragments in ./obj/zen3/frame
configure: symbolic link to Makefile already exists; forcing creation of new link.
configure: symbolic link to blis.pc.in already exists; forcing creation of new link.
configure: symbolic link to common.mk already exists; forcing creation of new link.
configure: symbolic link to 'config' directory already exists; forcing creation of new link.
configure: configured to build outside of source distribution.

@fgvanzee I'll let you close in case you wanted some follow up questions, but to me this seems resolved! :)

Great news, @zerothi. Thanks for that update.