dblalock / bolt

10x faster matrix and vector operations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support ARM v7+

dblalock opened this issue · comments

At present, Bolt is only implemented for x86 machines with AVX2 instructions. Adding support for other architectures would entail reimplementing the code in bolt.hpp and adding #ifdefs to select the appropriate implementation for the target architecture.

@dblalock, This issue still seems to be applicable. I want to compiled madness algorithm for RISC-V architecture and I think i will have the same problem there as AVX2 instructions does not exist there. I would be interested to contribute if you have some ideas how to resolve it.

Hi @zinovya. RISC-V support would be really cool. I've personally never used it, but I think it's just a matter of porting all the mithral C++ functions. There's a decent amount of indirection regarding vector widths, strides, etc, so it shouldn't be terrible to port. The main subtleties will be making sure the right instructions get emitted--e.g., I found that my compiler refused to emit vpavgb, which required me to use inline asm to get decent performance.

Hi, saw this via HN. Congrats, very impressive result @dblalock!
Our github.com/google/highway might be useful for porting - it provides cross-platform intrinsics which generally map closely to x86 intrinsics. For example, PSHUFB is TableLookupBytes (or TableLookupBytesOr0 if you care about the zeroing behavior as well).
We support Arm v7, RISC-V V, SSE4/AVX2/AVX-512, SVE etc. Happy to discuss if you're interested.

Thanks for reaching out! I'm personally unlikely to do this in the foreseeable future since I'm not really adding new code anymore, but great to have this reference here!