There are 0 repository under ptx topic.
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
row-major matmul optimization
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.
CUDA kernels in any language supported by LLVM
Set of examples written for hardware acceleration via TornadoVM
FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation
Visual Studio Code extension with PTX assembly syntax support
🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....
PTX Inject and Stack PTX for Python
Unsloth Puzzle 2-16. Notes and indications of progress. Currently: 25 points
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
PTX interpreter which lets you run CUDA code on CPU
Repository built for community contributions for upgrading Junos OS.
PTX Inject and Stack PTX
Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU