royinx's starred repositories
PyMacroRecord
Free and Open Source Macro Recorder with a modern GUI using Python
compute-sanitizer-samples
Samples demonstrating how to use the Compute Sanitizer Tools and Public API
deployment
RAPIDS Deployment Documentation
bitsandbytes
8-bit CUDA functions for PyTorch, modified to build on NVIDIA Jetson
gpt-migrate
Easily migrate your codebase from one framework or language to another.
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
dbscan-cuda
并行计算作业 DBSCAN algorithm with C++ and CUDA
Competitive-Programming
Competitive Programming problem solutions.
Nsight-Systems-Docker-Image
Nsight Systems in Docker
nvjpeg-python
nvjpeg for python
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
cpp-cheat-sheet
C++ Syntax, Data Structures, and Algorithms Cheat Sheet
open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
kernel_tuner
Kernel Tuner
triton_ensemble_model_demo
triton server ensemble model demo