There are 270 repositories under cuda topic.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Build and run Docker containers leveraging NVIDIA GPUs
Instant neural graphics primitives: lightning fast NeRF and more
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Modular ZK(Zero Knowledge) backend accelerated by GPU
Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A PyTorch Library for Accelerating 3D Deep Learning Research
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Lightning fast C++/CUDA neural network framework