Jake Hemstad's repositories
two_largest
Adventure in profiling and optimization.
cuda_scalar_result
Answering "What is the faster way to return a single scalar from a kernel to host?"
example_cuda_benchmark
Template repository for CUDA enabled benchmarks using Google Benchmark
nvtx_wrappers
This repository is deprecated and the code has moved to the official NVIDIA NVTX github repository: https://github.com/NVIDIA/NVTX
creduce-example
Examples on how to use C-Reduce to create minimal compiler bug reproducers
accelerated-computing-hub
NVIDIA curated collection of educational resources related to general purpose GPU programming.
cccl
CUDA C++ Core Libraries
compiler-explorer
Run compilers interactively from your web browser and interact with the assembly
cub
Cooperative primitives for CUDA C++.
cuda-api-wrappers
Thin C++-flavored wrappers for the CUDA Runtime API
cuda-quantum
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
cutlass
CUDA Templates for Linear Algebra Subroutines
gil_preload
Add NVTX ranges to Python GIL
libcudacxx
The NVIDIA C++ Standard Library
llm.c
LLM training in simple, raw C/CUDA