avx-512

There are 3 repositories under avx-512 topic.

google / highway
Performance-portable, length-agnostic SIMD with runtime dispatch
simd simd-instructions simd-programming intrinsics avx2 avx512 neon wasm avx avx-512 avx-instructions sse42 simd-library simd-parallelism simd-intrinsics
Language:C++ 3673
RoaringBitmap / CRoaring
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks
arm avx-512 avx2 bitset bitset-library c clang gcc neon roaring-bitmaps visual-studio
Language:C 1464
simdutf
simdutf / simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
avx-512 avx2 cpp neon risc-v simd sse2 transcoding unicode utf16 utf8
Language:C++ 974
aff3ct / MIPP
MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX, AVX-512 and SVE (length specific).
vector avx simd wrapper portable sse neon avx-512 sve
Language:C++ 465
intel / hexl
Intel:registered: Homomorphic Encryption Acceleration Library accelerates modular arithmetic operations used in homomorphic encryption
homomorphic-encryption avx-512 privacy cryptography
Language:C++ 210
awesome-simd / awesome-simd
A curated list of awesome SIMD frameworks, libraries and software
simd avx-512 avx2 vectorized-computation intrinsics
113
swojtasiak / fcml-lib
A general purpose machine code manipulation library for x86-32 (IA-32) and x86-64 (AMD64) architectures (Assembler, Disassembler, Library).
disassembler assembler code-generator shared-library avx avx2 sse sse2 ssse3 sse3 sse41 sse42 xop x86 x86-32 x86-64 intel avx-512 amd64
Language:C 83
bgin / Radar_ElectroOptical_Simulation
(REOS) Radar and ElectroOptical Simulation Framework written in Fortran.
amdgpu avx avx-512 avx2 c99 control-systems cuda-kernels fortran90 gpu-acceleration high-performance-computing infrared-sensors modeling openmp radar radiative-transfer simd simulation vectorization
Language:Fortran 45
simdutf / is_utf8
Fast C++ function "is_utf8": checks if the input is valid UTF-8. Made of a single source file. Optimized for ARM NEON, x64 SSE, AVX2 and AVX-512.
avx-512 avx2 cpp neon simd unicode
Language:C++ 44
twest820 / AVX-512
AVX-512 documentation beyond what Intel provides
avx-512 avx-instructions amd intel
33
rainerzufalldererste / hypersonic-rle-kit
The fastest Run-Length-Encoding on the Planet (for x64)
compression compression-algorithm rle rle-compression-algorithm runlengthencoding c simd-variants avx2 avx avx-512
Language:C 25
MamarezaAlipour / AVX-Hole
AVX-Hole C++ SIMD Library
avx2 avx-512 cpp simd intel
Language:C++ 16
nidud / asmc
Masm compatible assembler
assembler avx avx-512 linux masm sse x86 x86-64
Language:Assembly 12
romz-pl / matrix-matrix-multiply
Algorithms for matrix matrix multiplication, dgemm, AVX-256, AVX-512
avx-512 avx dgemm matrix-multiplication
Language:C++ 11
ammarfaizi2 / memcpy_benchmark
Benchmark to show which is the fastest memcpy.
x86-64 memcpy avx sse avx-512 performance hpc
Language:Assembly 10
quasilyte / avx512test
Utility that was used to generate initial Go AVX-512 encoder test suite.
golang go avx512 avx-512 xed intel-xed asm tests encoder
Language:Assembly 9
tugrul512bit / VectorizedKernel
Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures
avx avx-512 avx512 cpp cpp14 cpu simd sse vectorization header-only gcc gpgpu multithreading parallel simulation
Language:C++ 7
intel / document-level-sentiment-analysis
Document Level Sentiment Analysis is an End-to-End deep learning workflow using Hugging Face transformers API to do a "classification" task at document level, to analyze the sentiment of input document containing English sentences or paragraphs.
ai avx-512 avx-instructions bert-fine-tuning bert-model deep-learning fine-tuning huggingface huggingface-transformers inference intel intel-xeon one-api pytorch sentiment-analysis sentiment-classification tensorflow transfer-learning transformers xeon
Language:Python 6
hubery-tao / fast_math
high-speed math functions based on AVX-512 intrinsics
avx-512 simd-intrinsics
Language:C++ 4
jonicho / simd-radix-sort
A generic and efficient SIMD implementation of MSB Radix Sort with separate key and payload datastreams that supports arbitrary key and payload data types written in C++ accompanied by a bachelor's thesis.
avx-512 avx512 cpp radix-sort simd sorting
Language:C++ 4
toshioendo / hoalgos
Implementation of Hierarchy Oblivious Algorithms
memory-hierarchy multicore gpu simd avx-512
Language:C++ 3
venovako / VecKog
The vectorized (AVX-512) batched singular value decomposition algorithm for matrices of order two.
kogbetliantz svd singular-value-decomposition avx-512 vectorization
Language:C 3
antoinecarme / xeon-phi-data
Data for Intel Xeon-Phi server used in PyAF tests
pyaf forecasting time-series xeon-phi asrock knights-landing x200 knl-7210 avx-512 mic-architecture airmont manycore machine-learning asrockrack x200d6hm 2u4n-f home-server reduce-noise fan-speed
Language:Python 2
gfurtadoalmeida / study-assembly-x64
Projects and annotations used to learn x64 assembly.
assembly avx x64-assembly avx2 avx-512 study
Language:C++ 2
lemonjesus / avx512-polyline
An implementation of Google's Encoded Polyline algorithm in AVX512 because why not. Perhaps the fastest and least portable polyline encoder out there?
avx avx-512 avx512 polyline-encoder
Language:C 2
pcineverdies / FFT-AVX-512
Fast Fourier Transform implementation though x86 AVX-512 SIMD extension
avx avx-512 fft simd vectorization
Language:C++ 2
nsomatilda / Matilda
Matilda is a library to repeatedly multiply a constant matrix with a variable vector
avx-512 avx2 low-latency matrix-vector-multiplication multithreading realtime simd gemv
Language:C++ 1
zingaburga / SIMDflate
Experimental speed-oriented DEFLATE implementation, based on AVX-512
avx-512 deflate
Language:C++ 1
DmitryYurov / BitsCount
Count set bits in an integer
avx-512 cpp low-level-programming
Language:C++ 0
PhuNH / hpc-lab
Scientific Computing - High-Performance Computing Practical Course in WS18-19 at TUM
knl avx-512 avx slurm dgemm
Language:C++ 0
falseywinchnet / tomatofft
The Tomato Patch FFT is the fastest FFT in the world- but it is by no means efficient.
avx-512 fft rfft vectorization decimation-in-frequency
harshapathuri86 / parallel-codes
avx-512 avx512 matrix-multiplication open-mp parallel-processing parallel-programming parallelization vector-addition
Language:C++
kvr000 / zbynek-cxx-exp
Zbynek's various C and C++ experiments
matrix-multiplication cpu-benchmark linker avx512 avx-512 avx512f matrix-math-library matrix-math
Language:C++
SESAME-Synchrotron / orbit-feedback
Design of the Fast-Orbit Feedback correction for SESAME's accelerator
avx-512 c linux matrix-multiplication-parallel
Language:C
stefan-zobel / cramer
Some loose performance experiments with Agner Fog's VCL
avx2 avx-512 simd vectorization jni
Language:C++

avx-512

google / highway

RoaringBitmap / CRoaring

simdutf / simdutf

aff3ct / MIPP

intel / hexl

awesome-simd / awesome-simd

swojtasiak / fcml-lib

bgin / Radar_ElectroOptical_Simulation

simdutf / is_utf8

twest820 / AVX-512

rainerzufalldererste / hypersonic-rle-kit

MamarezaAlipour / AVX-Hole

nidud / asmc

romz-pl / matrix-matrix-multiply

ammarfaizi2 / memcpy_benchmark

quasilyte / avx512test

tugrul512bit / VectorizedKernel

intel / document-level-sentiment-analysis

hubery-tao / fast_math

jonicho / simd-radix-sort

toshioendo / hoalgos

venovako / VecKog

antoinecarme / xeon-phi-data

gfurtadoalmeida / study-assembly-x64

lemonjesus / avx512-polyline

pcineverdies / FFT-AVX-512

nsomatilda / Matilda

zingaburga / SIMDflate

DmitryYurov / BitsCount

PhuNH / hpc-lab

falseywinchnet / tomatofft

harshapathuri86 / parallel-codes

kvr000 / zbynek-cxx-exp

SESAME-Synchrotron / orbit-feedback

stefan-zobel / cramer