There are 3 repositories under avx-512 topic.
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, Alibaba Tair, Redpanda, YDB and StarRocks
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension, LoongArch64, POWER. Part of Node.js, WebKit/Safari, Ladybird, Chromium, Cloudflare Workers and Bun.
Portable wrapper for SIMD and vector instructions written in C++11. Compatible with NEON, SSE, AVX, AVX-512 and SVE (length specific).
Intel:registered: Homomorphic Encryption Acceleration Library accelerates modular arithmetic operations used in homomorphic encryption
A curated list of awesome SIMD frameworks, libraries and software
A general purpose machine code manipulation library for x86-32 (IA-32) and x86-64 (AMD64) architectures (Assembler, Disassembler, Library).
Fast C++ function "is_utf8": checks if the input is valid UTF-8. Made of a single source file. Optimized for ARM NEON, x64 SSE, AVX2 and AVX-512.
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
The fastest Run-Length-Encoding on the Planet (for x64)
Algorithms for matrix matrix multiplication, dgemm, AVX-256, AVX-512
A generic and efficient SIMD implementation of MSB Radix Sort with separate key and payload datastreams that supports arbitrary key and payload data types written in C++ accompanied by a bachelor's thesis.
Benchmark to show which is the fastest memcpy.
Vector Dossier is a CLI tool that statically analyzes vectorization depth of programs and libraries
Utility that was used to generate initial Go AVX-512 encoder test suite.
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
Document Level Sentiment Analysis is an End-to-End deep learning workflow using Hugging Face transformers API to do a "classification" task at document level, to analyze the sentiment of input document containing English sentences or paragraphs.
Data for Intel Xeon-Phi server used in PyAF tests
The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.
Experimental speed-oriented DEFLATE implementation, based on AVX-512
Matilda is a library to repeatedly multiply a constant matrix with a variable vector
Implementation of Hierarchy Oblivious Algorithms
Projects and annotations used to learn x64 assembly.
An implementation of Google's Encoded Polyline algorithm in AVX512 because why not. Perhaps the fastest and least portable polyline encoder out there?
Fast Fourier Transform implementation though x86 AVX-512 SIMD extension
Vectorized Efficient C++ Tool for Analytical Series Expansion of Mutual Inductance for Circular Coils with Rectangular Cross-Section in Coaxial Configuration
SIMD implementation of fraction-free Gaussian elimination over a prime field
Design of the Fast-Orbit Feedback correction for SESAME's accelerator
Zbynek's various C and C++ experiments
Some loose performance experiments with Agner Fog's VCL