Tiancheng-Luo / halide-cuda-sat-perf

Summed area table performance test

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Halide performance debugging

Comparision between Halide and CUDA version of an app that partitions a 4096x4096 image into 32x32 tiles and computes the summed area table within each tile.

Halide version: runs several kernels - only version 0 computes the summed area table, other kernels are meant to demonstrate the effect of different Halide update definitions on instruction count and global memory throughput.

CUDA version: source code from GPU efficient recursive filtering and summed area table (SIGGRAPH 2011), [Nehab et al.]


  • Makefile provided for both projects
  • Edit Makefile.common to set the CUDA include path and Halide base path

Profiling files

The directory nv_profile NVIDIA profiling tools profiling logs. Can be opened using

$ nvvp cuda_summed_table.nvvp
$ nvvp hl_summed_table.nvvp

Generated ptx and stamement files

The directory ptx and stmt contains the generated ptx and statement files for the different Halide kernels. These can be regenerated by:

$ HL_JIT_TARGET=cuda-gpu_debug HL_DEBUG_CODEGEN=1 ./hl_summed_table 2> hl_summed_table.ptx


Summed area table performance test


Language:CSS 47.0%Language:C 37.5%Language:C++ 9.1%Language:Cuda 6.4%