cuda

There are 270 repositories under cuda topic.

vllm-project / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference kimi llama llm llm-serving model-serving moe openai pytorch qwen qwen3 tpu transformer
Language:Python 62446
hashcat / hashcat
World's fastest and most advanced password recovery utility
c cracking cuda gpgpu hashcat hashes opencl password
Language:C 24671
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
blackwell cuda deepseek deepseek-r1 deepseek-v3 deepseek-v3-2 gpt-oss inference kimi llama llama3 llava llm llm-serving moe openai pytorch qwen3 transformer vlm
Language:Python 19965
NVIDIA / nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
cuda docker gpu nvidia-docker
17435
instant-ngp
NVlabs / instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
neural-network machine-learning cuda nerf computer-graphics computer-vision 3d-reconstruction signed-distance-functions function-approximation real-time realtime real-time-rendering
Language:Cuda 17038
kaldi-asr / kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
c-plus-plus cuda kaldi shell speaker-id speaker-verification speech speech-recognition speech-to-text
Language:Shell 15211
vosen / ZLUDA
CUDA on non-NVIDIA GPUs
cuda rust
Language:Rust 13385
tracel-ai / burn
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
autodiff cross-platform cuda deep-learning kernel-fusion machine-learning metal ndarray neural-network onnx pytorch rocm rust scientific-computing tensor vulkan wasm webgpu
Language:Rust 13363
Open3D
isl-org / Open3D
Open3D: A Modern Library for 3D Data Processing
3d 3d-perception arm computer-graphics cpp cuda gpu gui machine-learning mesh-processing odometry opengl pointcloud python pytorch reconstruction registration rendering tensorflow visualization
Language:C++ 12955
NVIDIA / TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
blackwell cuda llm-serving moe pytorch
Language:C++ 12065
srush / GPU-Puzzles
Solve puzzles. Learn CUDA.
cuda machine-learning puzzles
Language:Jupyter Notebook 11623
numba
numba / numba
NumPy aware dynamic Python compiler using LLVM
compiler cuda llvm numba numpy parallel python
Language:Python 10714
cupy / cupy
NumPy & SciPy for GPU
cublas cuda cudnn cupy curand cusolver cusparse cusparselt cutensor gpu nccl numpy nvrtc nvtx python rocm scipy tensor
Language:Python 10595
oneflow
Oneflow-Inc / oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
cuda deep-learning deep-neural-networks distributed machine-learning ml neural-network
Language:C++ 9369
rapidsai / cudf
cuDF - GPU DataFrame Library
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Language:C++ 9310
replicate / cog
Containers for machine learning
ai containers cuda deep-learning docker machine-learning pytorch tensorflow
Language:Go 8901
NVIDIA / cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
cuda deep-learning deep-learning-library cpp nvidia gpu python
Language:C++ 8741
catboost / catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
machine-learning decision-trees gradient-boosting gbm gbdt python r kaggle gpu-computing catboost tutorial categorical-features gpu coreml data-science big-data cuda data-mining
Language:C++ 8644
NVIDIA / cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cuda cuda-driver-api cuda-kernels cuda-opengl
Language:C 8406
LeetCUDA
xlite-dev / LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
cuda cuda-12 cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-toolkit flash-attention hgemm learn-cuda leet-cuda
Language:Cuda 8348
kroma-network / tachyon
Modular ZK(Zero Knowledge) backend accelerated by GPU
blockchain c-plus-plus cpp17 cryptocurrency cryptography cuda kroma tachyon zero-knowledge zk
Language:C++ 7739
hybridgroup / gocv
Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
opencv golang video computer-vision video-capture face-tracking mjpeg mjpeg-stream image-processing tensorflow computervision openvino dnn gocv object-tracking object-classification cuda onnx yolo
Language:Go 7273
nvitop
XuehaiPan / nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
nvidia nvidia-smi monitoring-tool gpu-monitoring curses console gpu top cuda process-monitoring resource-monitor nvml command-line-tool monitoring htop grafana grafana-dashboard prometheus prometheus-exporter
Language:Python 6266
LMCache / LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
amd cuda fast inference kv-cache llm pytorch rocm speed vllm
Language:Python 5918
chainer / chainer
A flexible framework of neural networks for deep learning
chainer cuda cudnn cupy deep-learning gpu machine-learning neural-network neural-networks numpy python
Language:Python 5907
NVIDIA / warp
A Python framework for accelerated simulation, data generation and spatial computing.
cuda differentiable-programming gpu gpu-acceleration nvidia nvidia-warp python
Language:Python 5749
alien
chrxh / alien
ALIEN is a CUDA-powered artificial life simulation program.
agent-based-simulation artificial-life cuda open-ended-evolution physics-engine
Language:C++ 5272
rapidsai / cuml
cuML - RAPIDS Machine Learning Library
machine-learning-algorithms machine-learning cuda gpu nvidia
Language:C++ 4998
thrust
NVIDIA / thrust
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
cuda nvidia-hpc-sdk thrust gpu cpp cpp11 cpp14 cpp17 cpp20 cxx cxx11 cxx14 cxx17 cxx20 algorithms nvidia gpu-computing
Language:C++ 4984
NVIDIAGameWorks / kaolin
A PyTorch Library for Accelerating 3D Deep Learning Research
3d-deep-learning artificial-intelligence camera-api cuda differentiable-lighting differentiable-rendering gaussian-splatting interactive-visualizations neural-networks nvidia-warp physics-simulation pytorch rasterization
Language:Python 4948
Rust-GPU / rust-cuda
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang
Language:Rust 4822
arrayfire
arrayfire / arrayfire
ArrayFire: a general purpose GPU library.
arrayfire c c-plus-plus cpp cuda gpgpu gpu hpc opencl performance scientific-computing
Language:C++ 4812
shader-slang / slang
Making it easier to work with shaders
cuda d3d12 glsl hlsl shaders vulkan
Language:C++ 4705
OAID / Tengine
Tengine is a lite, high performance, modular inference engine for embedded device
arm machine-learning artificial-intelligence cnn tensorflow pytorch onnx x86-64 mips cuda tensorrt acl npu riscv supperedge container nvdla
Language:C++ 4495
NVlabs / tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
cuda deep-learning gpu mlp nerf neural-network pytorch real-time rendering
Language:C++ 4297
ROCm / hip
HIP: C++ Heterogeneous-Compute Interface for Portability
cuda hip hip-kernel-language hip-portability hip-runtime hipify
Language:C++ 4227

Related Topics