zhouhaoyi/CUDAHammingMean

Fastest GPU implementation of a brute-force Hamming-weight matrix for 512-bit binary descriptors.

Yes, that means the DIFFERENCE in popcounts is used for thresholding, NOT the ratio. This is the CORRECT approach for binary descriptors.

A key insight responsible for much of the performance of this insanely fast CUDA kernel is due to Christopher Parker (https://github.com/csp256), to whom I am extremely grateful.

CUDA CC 3.0 or higher is required.

All functionality is contained in the files CUDAHammingMean.h and CUDAHammingMean.cu. 'main.cpp' is simply a sample test harness with example usage and performance testing.

About

Fastest GPU implementation of a brute-force Hamming-weight matrix sum/mean for 512-bit binary descriptors.

Languages

Language:C++ 62.9%Language:Cuda 28.4%Language:Makefile 8.7%