bfloat16

There are 1 repository under bfloat16 topic.

uxlfoundation / oneDNN
oneAPI Deep Neural Network Library (oneDNN)
onednn oneapi deep-learning deep-neural-networks performance cpp openmp tbb x86-64 x64 aarch64 avx512 amx xe-architecture library bfloat16 sycl vnni
Language:C++ 3911
SimSIMD
ashvardanian / SimSIMD
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐
arm-neon arm-sve assembly avx2 distance-calculation metrics neon simd simd-instructions information-retrieval numpy scipy similarity-measures similarity-search vector-search avx512 blas blas-libraries float16 bfloat16
Language:C 1539
libxsmm / libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
jit simd avx512 machine-learning sparse blas matrix-multiplication transpose bfloat16 avx2 avx sse vector intel matrix tensor convolution amx fortran
Language:C 917
VoidStarKat / half-rs
Half-precision floating point types f16 and bf16 for Rust.
rust floating-point crates f16 bfloat16 ieee754 float16 binary16 rust-embedded
Language:Rust 267
JuliaMath / BFloat16s.jl
Julia implementation for the BFloat16 number type
bfloat16 julia math
Language:Julia 53
higham / chop
Round matrix elements to lower precision in MATLAB
arithmetic arithmetic-formats matlab precisions ieee subnormal-numbers matrix rounding bfloat16 half-precision fp16 binary16
Language:MATLAB 37
tlfloat
shibatch / tlfloat
C++ template library for floating point operations
arbitrary-precision constexpr cuda floating-point ieee754 math quadruple-precision templates cplusplus cpp20 octuple-precision float128 half-precision bfloat16 cross-platform library float256 heapless elementary-functions
Language:C++ 35
DW0RKiN / Floating-point-Library-for-Z80
Floating-Point Arithmetic Library for Z80
z80 floating-point half-precision binary16 bfloat16 bfloat half zx-spectrum
Language:Assembly 25
flop
afterdusk / flop
IEEE 754-style floating-point converter
ieee-754 floating-point bfloat16 tensorfloat fp16 floating-point-conversion
Language:TypeScript 17
aahouzi / llama2-chatbot-cpu
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
bfloat16 cpu int8 langchain llama2 optimization streamlit smooth-quantization chatbot huggingface chatbot-memory chatgpt llama meta-ai intel ipex meta numa neural-compression 4-bit-cpu
Language:Python 14
nestordemeure / jochastic
A JAX implementation of stochastic addition.
addition jax rounding bfloat16 stochastic
Language:Python 14
KernelTuner / kernel_float
CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development
bfloat16 cuda floating-point gpu half-precision header-only-library hip kernel-tuner low-precision mixed-precision performance vectorization cpp reduced-precision
Language:C++ 12
d4l3k / go-bfloat16
Bfloat16 conversion utilities for Go/Golang
bfloat16 binary16 golang go float16 bf16
Language:Go 9
nestordemeure / stochastorch
A Pytorch implementation of stochastic addition.
addition bfloat16 floating-point pytorch stochastic
Language:Python 7
imciner2 / ChopBLAS
Basic linear algebra routines implemented using the chop rounding function
matlab arithmetic bfloat16 half-precision matrix rounding stochastic-rounding
Language:MATLAB 3
puzzlef / vector-sum
Comparison of vector element sum using various data types.
experiment vector sum sequential single-threaded float bfloat16
Language:C++ 3
sigurd4 / custom_float
Customizable floating point types, with all standard floating point operations implemented from scratch.
bfloat16 float floating-point ieee754 tensorfloat
Language:Rust 3
puzzlef / pagerank-datatype
Comparison of PageRank algorithm using various datatypes.
experiment graph pagerank pull csr single-threaded sequential float bfloat16
Language:C++ 2
StarOne01 / bfloat16
A lightweight C++ implementation of the Brain Floating Point (bfloat16) format.
bfloat16 cxx float
Language:C++ 1
stevechanieee / -1-BFloat16
Hybridized On-Premise and Cloud (HOPC) Deployment Experimentation with Bfloat16
on-premise-cloud bfloat16
0
yohanchatelain / floacon-firebase
Floating-point converter (firebase)
bfloat16 floating-point ieee-754
Language:TypeScript