Is there are any plans to optimise for neon arm64 instructions?
freedbrt opened this issue · comments
This project is optimised only for SSE3 instructions? Are there any benchmarks for arm64 processors?
Yes! At the moment I am busy with future job applications as academic researcher, then I'll be unemployed until September and will work on making edge264 as production-ready as can be :
- multithreading first since it impacts API and I want it frozen sooner
- integration in ffmpeg/VLC
- ARM support (both 32 and 64 bit ideally)
The project is optimized for SSSE3, with a few key speedups for SSE4.2 and AVX2. Unlike other decoders, I do not maintain many versions of each SIMD routine (e.g. for SSE2, SSSE3, AVX2, AVX512) to reduce maintenance nightmares (which do not bring much speedup anyway). So ARM support will certainly be limited to NEON instead of SVE. Unless I can find a programming language that allows writing one code version and compile it to perfect NEON and SVE assembly :)