Andrei Pokrovsky's repositories
BERT-ONNX
BERT ONNX PRE/POST - OPTIMIZATION
ConcurrentDeque
Fast, generalized, implementation of the Chase-Lev lock-free work-stealing deque for C++17
CudaSharedPtr
Shared Pointer for Cuda Device Pointers and Cuda Streams, Smart Wrapper to Allocate and Deallocate Cuda Device Buffer.
DeepSpeedExamples
Example models using DeepSpeed
doroce-linux
A command line utility to manage the configuration of a system's high performance network interfaces for RoCE deployments
keyboard-layout-converter
A simple python script to convert a Windows .klc keyboard layout to a Linux .xkb file
likwid
Performance monitoring and benchmarking suite
mpi_test
Examples and tests for MPI+CUDA with CMake
MuZero
An Implementation of MuZero in PyTorch and Ray for reversi
nccl-tests
NCCL Tests
necklace
Distributed deep learning framework based on pytorch/mxnet/numba and nccl.
oneCCL
oneAPI Collective Communications Library (oneCCL)
onnx-opcounter
Count number of parameters / MACs / FLOPS for ONNX models.
param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
perftest
Infiniband Verbs Performance Tests
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-extension
an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors
pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
radiation-benchmarks
Benchmarks used for radiation tests
smem
Smem memory reporting tool for Python 3
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
tlsf
Two-Level Segregated Fit memory allocator implementation.
torch-blocksparse
Block-sparse primitives for PyTorch
torch2trt
An easy to use PyTorch to TensorRT converter
triton
Development repository for the Triton language and compiler
xingtian
xingtian is a componentized library for the development and verification of reinforcement learning algorithms