There are 18 repositories under cuda-programming topic.
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Sample codes for my CUDA programming book
Safe rust wrapper around CUDA toolkit
TinyChatEngine: On-Device LLM Inference Library
Thin, unified, C++-flavored wrappers for the CUDA APIs
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
A self-learning tutorail for CUDA High Performance Programing.
A simple GPU hash table implemented in CUDA using lock free techniques
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Speed up image preprocess with cuda when handle image or tensorrt inference
GPU Engineering for AI Systems
CUDA Guide
Install CUDA on Windows11 using WSL2
DISTWAR atomic reduction optimization on "3D Gaussian Splatting for Real-Time Radiance Field Rendering".
A high performance and friendly GPU LBVH implementation.
Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
A Complete beginner's introduction to programming with CUDA Fortran
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
simple GPU ransac fitting of multiple lines on 2d/3d point cloud
Some common CUDA kernel implementations (Not the fastest).
StiffMa: Fast finite element STIFFness MAtrix generation in MATLAB by using GPU computing.
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
The cuda code is mainly for nvidia hardware device. This repo will show how to run cuda c or cuda cpp code on the google colab platform for free.