flame / blis

BLAS-like Library Instantiation Software Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Native execution fails when only real-domain haswell kernels are registered

fgvanzee opened this issue · comments

Registering haswell kernels in the haswell subconfig as follows results in failures for native execution of level-3 operations on complex datatypes:

    bli_cntx_set_l3_nat_ukrs
    (
      2,
      // gemm
      BLIS_GEMM_UKR,       BLIS_FLOAT,    bli_sgemm_haswell_asm_6x16,       TRUE,
      BLIS_GEMM_UKR,       BLIS_DOUBLE,   bli_dgemm_haswell_asm_6x8,        TRUE,
      cntx
    );

Notice that these are the default row-preferential sgemm and dgemm ukernels. This seems to be the only change needed to trigger the failures of native execution.

Abbreviated testsuite output:

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_sgemm_nn_rrr                  400   400   400    71.24   5.34e-09   PASS

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_dgemm_nn_rrr                  400   400   400    44.40   1.32e-17   PASS

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_cgemm1m_nn_rrr                400   400   400    77.26   5.88e-09   PASS
blis_cgemm_nn_rrr                  400   400   400    10.23   1.24e-02   FAILURE

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_zgemm1m_nn_rrr                400   400   400    40.00   2.92e-17   PASS
blis_zgemm_nn_rrr                  400   400   400     7.58   1.36e-02   FAILURE

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_sgemm_nn_ccc                  400   400   400    98.07   1.39e-08   PASS

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_dgemm_nn_ccc                  400   400   400    41.89   2.76e-17   PASS

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_cgemm1m_nn_ccc                400   400   400    87.96   4.80e-09   PASS
blis_cgemm_nn_ccc                  400   400   400    10.09   1.29e-02   FAILURE

% blis_<dt><op>_<params>_<stor>      m     n     k   gflops   resid      result
blis_zgemm1m_nn_ccc                400   400   400    44.10   8.72e-18   PASS
blis_zgemm_nn_ccc                  400   400   400     7.48   1.27e-02   FAILURE

I stumbled upon this issue when preparing to investigate #557.

Nevermind, false alarm. I was mixing the assembly-based register blocksizes with reference kernels, which I forgot have hard-coded blocksizes.

Have I ever mentioned that I'm not a fan of the reference kernels having hard-coded register blocksizes? 😐