Vectorch AI's repositories
3FS
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
cutlass
CUDA Templates for Linear Algebra Subroutines
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
vcpkg
C++ Library Manager for Windows, Linux, and MacOS
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.