There are 5 repositories under avx512 topic.
oneAPI Deep Neural Network Library (oneDNN)
Implementations of SIMD instruction sets for systems which don't natively support them.
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
World's fastest log analysis: λ + SQL + JSON + S3
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
🚀 Fast prime number generator
C++ template library for high performance SIMD based sorting algorithms
🚀 Fast C/C++ bit population count library
Agenium Scale vectorization library for CPUs and GPUs
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
🚀 Fast prime counting function implementations
Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!
SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
Examples of C# code compiled to GPU by hybridizer
Boost SIMD
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
RV: A Unified Region Vectorizer for LLVM