avx512

There are 5 repositories under avx512 topic.

simdjson
simdjson / simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
aarch64 arm arm64 avx2 avx512 c-plus-plus clang clang-cl cpp11 gcc-compiler json json-parser json-pointer loongarch neon simd sse42 vs2019 x64
Language:C++ 19858
google / highway
Performance-portable, length-agnostic SIMD with runtime dispatch
avx avx-512 avx-instructions avx2 avx512 intrinsics neon simd simd-instructions simd-intrinsics simd-library simd-parallelism simd-programming sse42 wasm
Language:C++ 4458
HJLebbink / asm-dude
Visual Studio extension for assembly syntax highlighting and code completion in assembly files and the disassembly window
assembler assembly assembly-language-programming avx2 avx512 code-completion disassembly masm nasm syntax-highlighting visual-studio visual-studio-extension x86-64
Language:C# 4137
oneapi-src / oneDNN
oneAPI Deep Neural Network Library (oneDNN)
aarch64 amx avx512 bfloat16 cpp deep-learning deep-neural-networks library oneapi onednn openmp performance sycl tbb vnni x64 x86-64 xe-architecture
Language:C++ 3746
simd-everywhere / simde
Implementations of SIMD instruction sets for systems which don't natively support them.
altivec arm arm64 avx avx2 avx512 fma gfni mmx neon powerpc simd simd-intrinsics sse sse2 sse3 sse41 sse42 ssse3 vectorization
Language:C 2588
xtensor-stack / xsimd
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
avx avx512 c-plus-plus-11 cpp mathematical-functions neon simd simd-instructions simd-intrinsics sse sve vectorization
Language:C++ 2325
ermig1979 / Simd
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM.
amx arm avx avx512 c-plus-plus haar-cascade image-processing lbp machine-learning neon neural-network simd simd-library sse
Language:C++ 2121
kfrlib / kfr
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
audio audio-processing avx avx512 clang cplusplus cplusplus-14 cplusplus-17 cpp14 cpp17 cxx dft digital-signal-processing discrete-fourier-transform dsp fast-fourier-transform fft header-only simd
Language:C++ 1711
VcDevel / Vc
SIMD Vector Classes for C++
avx avx2 avx512 c-plus-plus cpp cpp11 cpp14 cpp17 data-parallel neon parallel parallel-computing portable simd simd-instructions simd-programming simd-vector sse vectorization
Language:C++ 1475
SimSIMD
ashvardanian / SimSIMD
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
arm-neon arm-sve assembly avx2 avx512 bfloat16 blas blas-libraries distance-calculation float16 information-retrieval metrics neon numpy scipy simd simd-instructions similarity-measures similarity-search vector-search
Language:C 1283
p12tic / libsimdpp
Portable header-only C++ low level SIMD library
altivec avx2 avx512 msa neon simd sse vsx
Language:C++ 1267
SnellerInc / sneller
World's fastest log analysis: λ + SQL + JSON + S3
avx512 go high-performance indexless json log query-engine s3 schemaless serverless simd sql vectorized
Language:Go 1034
minio / sha256-simd
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
arm assembly avx avx-instructions avx512 golang intel plan9
Language:Go 1001
kimwalisch / primesieve
🚀 Fast prime number generator
prime-numbers sieve-of-eratosthenes math eratosthenes primes sieve avx512 arm-neon arm-sve number-theory
Language:C++ 988
intel / x86-simd-sort
C++ template library for high performance SIMD based sorting algorithms
argsort avx2 avx512 partialsort quickselect quicksort sort x86
Language:C++ 923
libxsmm / libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
amx avx avx2 avx512 bfloat16 blas convolution fortran intel jit machine-learning matrix matrix-multiplication simd sparse sse tensor transpose vector
Language:C 866
sleef
shibatch / sleef
SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
simd sse2 avx avx512 neon fft aarch64 sve arm vector-math vectorization math-library vsx elementary-functions s390x powerpc cuda quadruple-precision android ios
Language:C 707
VcDevel / std-simd
std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
avx avx512 cpp17 gcc libstdcxx neon simd sse wg21
Language:C++ 606
less_slow.cpp
ashvardanian / less_slow.cpp
Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
assembly assembly-language avx512 benchmark coroutines cpp cpp-programming cpp17 cpp20 cuda gcc google-benchmark hpc io-uring linux-kernel llvm ptx ranges tutorial tutorials
Language:C++ 490
WojciechMula / toys
Storage for my snippets, toy programs, etc.
avx2 avx512 sse string-algorithms
Language:C++ 349
kimwalisch / libpopcnt
🚀 Fast C/C++ bit population count library
avx2 avx512 c cpp neon popcnt popcount simd sve
Language:C 339
WojciechMula / sse-popcount
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
aarch64 arm-neon avx2 avx512 popcount sse
Language:C++ 335
agenium-scale / nsimd
Agenium Scale vectorization library for CPUs and GPUs
simd simd-programming sse2 sse42 avx avx2 avx512 neon neon128 aarch64 sve vectorization-library simd-instructions cuda rocm cpp20 cpp20-library hpc simd-library
Language:C 330
kimwalisch / primecount
🚀 Fast prime counting function implementations
prime-numbers math number-theory openmp primes avx512 arm-sve primepi
Language:C++ 316
RRZE-HPC / OSACA
Open Source Architecture Code Analyzer
hpc performance-analysis performance-modeling in-core x86 arm64v8 sve avx avx2 avx512 port-mapping out-of-order throughput latency loop-carried-dependency critical-path assembly python aarch64 neon
Language:Jupyter Notebook 313
Turbo-Base64
powturbo / Turbo-Base64
Turbo Base64 - Fastest Base64 SIMD:SSE/AVX2/AVX512/Neon/Altivec - Faster than memcpy!
base64 benchmark encoding-library encoding library simd sse avx avx2 neon arm base64-encoding base64-decoding avx512
Language:C 298
WojciechMula / sse4-strstr
SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
avx2 avx512 neon sse string-manipulation
Language:C++ 246
altimesh / hybridizer-basic-samples
Examples of C# code compiled to GPU by hybridizer
cuda gpu parallel visual-studio hybridizer-essentials avx avx2 avx512 dotnet compiler vectorization optimization
Language:C# 240
agenium-scale / boost.simd
Boost SIMD
aarch64 avx avx2 avx512 avx512f cpp11 fma neon neon128 neon64 parallel-computing portable simd simd-programming sse sse2 sse3 sse41 sse42 vectorization
232
WojciechMula / base64-avx512
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
base64 simd avx512
Language:C 203
minio / md5-simd
Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for server applications that need to compute many MD5 sums in parallel.
md5 simd avx2 avx512 assembly golang hashing performance
Language:Go 190
manodeep / Corrfunc
⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
astrophysics galaxies cosmology large-scale-structure pair-counting intrinsics python c openmp simd avx512 avx2 avx sse42 correlation-functions
Language:C 170
WojciechMula / base64simd
Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)
base64 simd avx2 avx512 neon sse
Language:C++ 163
Menooker / KunQuant
A compiler, optimizer and executor for financial expressions and factors
alpha101 avx512 compiler python quant quantitative-finance
Language:Python 140
yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
blas gemm avx512 simd mkl openmp
Language:C 134
animetosho / md5-optimisation
The fastest MD5 implementation using x86 assembly
avx512 md5
Language:C++ 126