imaginary-person's repositories
ArchBenchSuite
low level kernels to benchmark peak compute, cache bandwidth on various levels, memory bandwidth, and some basic compute routines
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
charm
The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
FasterTransformer
Transformer related optimization, including BERT, GPT
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
leveldb
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
likwid
Performance monitoring and benchmarking suite
llama2.c
Andrej Karpthy's Llama 2 inference in C
loop_tool
A thin, highly portable C++ intermediate representation for dense loop-based computation.
MonetDB
This is the official mirror of the MonetDB Mercurial repository. Please note that we do not accept pull requests on github. The regression test results can be found on the MonetDB Testweb http://monetdb.cwi.nl/testweb/web/status.php .For contributions please see: https://www.monetdb.org/Developers
nanoGPT
Andrej Karpathy's nanoGPT
obs-studio
OBS Studio - Free and open source software for live streaming and screen recording
pytorch-1
Tensors and Dynamic neural networks in Python with strong GPU acceleration
qBittorrent
qBittorrent BitTorrent client
rocksdb
A library that provides an embeddable, persistent key-value store for fast storage.
Stanford_CS348K_readings
This is a list of readings for Stanford CS348K.
stdgpu
stdgpu: Efficient STL-like Data Structures on the GPU
stylegan3
Official PyTorch implementation of StyleGAN3
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
torcharrow
A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format
torchdistx
Torch Distributed Experimental
torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
transformers
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
tuplex
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators