real-salient

This is a header-only library that implements a modified version of the GrabCut algorithm. The problem it solves is the two-class image segmentation (foreground/background); in other words, it detects a salient object in an RGB-D image. It follows the logic described in the DenseCut paper by Cheng et al[1], adapting it to GPU.

In short, the algorithm performs the following for every frame:

Get the color and depth buffers from a depth camera.
Assuming the salient object is in front, label the image pixels based on a simple treshold and the depth buffer (i.e. just like in the librealsense-grabcuts example).
Fit two Gaussian Mixture Models (GMM) onto the color frame to create the color models of background and foreground.
Use the trained models to label the image.
Use a Conditional Random Field model (CRF) to refine the labels.

Steps 2-5 are performed entirely on GPU, which allowed me to run the algorithm at steady 30 FPS.

Gaussian Mixture Models

The gmm.cuh module is a generic CUDA implementation of the GMM. It can fit M GMMs, K components each, on a single image at once. Thus, this module alone can be used for realtime M-class image segmentation.

The module uses the standard EM-algorithm for estimation and Cholesky decomposition for computing the covariance inverse and determinant.

Conditional Random Fields

The crf.cuh module is an adaptation of the GPU implementation of CRF by Jiahui Huang, who used Miguel Monteiro's implementation of fast gaussian filtering. The theory for this implementation can be found in [2] and [3].

Examples

The examples make use of the core real-salient as well as of couple VR-related tricks. They are hardcoded to use the Intel RealSense D415 camera and its SDK to capture a color+depth video stream (examples/vr-salient/include/cameraD415.hpp).

VR bounds: The examples use OpenVR to improve the initial guess of the salient object position (step 2 in the algorithm above). I attach an extra tracker to the depth camera to locate its position in VR. This allows me to find the position of the headset and hand controllers on the image via a simple coordinate transform. This is implemented in examples/vr-salient/include/vrbounds.hpp.

VR depth stencil: In addition to the tracker positions, I employ a Vulkan+OpenVR combination to render the VR shaperone bounds into a temporary buffer. This allows me to cut-off all objects outside the user-defined play area from the scene. This is implemented in examples/vr-salient/include/vulkanheadless.hpp.

vr-salient

vr-salient is a standalone program. In addition to the tweaks above, it uses OpenCV highgui library - only to display the window. The VR tricks are optional in this example.

saber-salient

saber-salient is a dynamic library to be used in my BeatSaber plugin. It functions the same as vr-salient, but does not require OpenCV and requires VR tracking.

Demo videos:

References

[1] [pdf] Cheng, M.M., Prisacariu, V.A., Zheng, S., Torr, P.H.S. and Rother, C. DenseCut: Densely Connected CRFs for Realtime GrabCut. Computer Graphics Forum, 34: 193-201. 2015.

[2] Krähenbühl, Philipp, and Vladlen Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems. 2011.

[3] Adams, Andrew, Jongmin Baek, and Myers Abraham Davis. Fast high‐dimensional filtering using the permutohedral lattice. Computer Graphics Forum. Vol. 29. No. 2. Oxford, UK: Blackwell Publishing Ltd, 2010.

achirkin / real-salient