AuburnSounds / intel-intrinsics

The Dlang SIMD library

Home Page:https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DMD: Support D_SIMD

p0nce opened this issue · comments

commented

Enabling core.simd:

  • We can enable core.simd usage with DMD today, without even using D_SIMD, which brings the performance gap of LDC vs DMD from 20x to 4x. DMD binaries that makes heavy usage of intel-intrinsics typically go 5x faster.

Instead of the slow replacements. But does it support float2, int2? (EDIT: no, but we can work around it)

  • D_SIMD and core.simd can be enabled with a constant
  • MMX
  • SSE
  • SSE2 up to line: 1024
  • SSE3
  • SSSE3
  • Wait for DMD 2.096 and test that _mm_movehl_ps generates MOVHLPS with __simd (and not MOVLPS)
  • Wait for DMD 2.096 and test that _mm_movelh_ps generates MOVLHPS with __simd (and not MOVHPS)
commented

Also DMD: use core.simd instead of emulation when available.

commented

Blocked by #59

commented

Each version of DMD bring regressions when SIMD vectors are actually used. It's a maintenance burden.

commented

D_SIMD finally enabled in intel-intrinsics v1.9, when DMD 2.099+ is used. Let's see what happens next.

commented

8 hours later I was asked to remove it. 4 bugs are kinda blockers for D_SIMD to happen (well, more will be found as translation progress, but also perf will augment).

commented

Enable core.simd and D_SIMD usage in DMD now! It seems like the best time to do it. (mmm, not really)

commented

DMD debug builds now surprisingly useful since, they are at a 5% difference with LDC builds but build faster. Could become both faster to build and more efficient with a bit of effort on intel-intrinsics.

commented

Another final attemps at making D_SIMD used by default. Phew.

commented

It triggered only one regression, some Linux only bugs, and it seems like this is it? D_SIMD finally activated. (but not for AVX, only SSE)

commented

Some critical instructions:

  • _mmcvtps_pd
  • _mm_srli_epi32
  • _mm_hadd_ps
commented

Besides, DMD output is wrong vs LDC on a complete plugin such as Lens.

commented

Critical for Lens:

  • inlining of _mm_addsub_pd with D_SIMD => didn't work
  • _mm_movelh_ps till buggy with D_SIMD? => no
  • _mm_cvtepi32_ps
    (try within Lens regression testing, it's dangerous)
commented

Are we still in the DMD test suite? latest DMD fails in GH Actions => yes we are