miguelcastillon / fgt_threshold

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fgt

Fast Gauss transforms.

The Gauss transform is a common operation that computes the per-point similarity between two data sets:

The Gauss transform

This a C++ library for computing the Gauss transform using the direct method as well as a few shortcuts. This code lives on Github, has some Doxygen documentation, and is tested with Travis and AppVeyor.

Travis Build Status AppVeyor Build status

Usage

There is one C++ header file, fgt.hpp, which has everything you need. Include that file and you're off to the races:

#include <fgt.hpp>

void my_great_function(const Eigen::MatrixXd& x, const Eigen::MatrixXd& y) {
    double bandwidth = 0.3;
    Eigen::VectorXd gauss_transform = fgt::direct(x, y, bandwidth);
}

The library provides a few different ways to calculate the Gauss transform:

  • fgt::direct calculates the exact Gauss transform, and is the most accurate and the slowest option.
  • fgt::direct_tree tries to do less work by only considering "close" points, where "close" is defined by the bandwidth. The direct tree method works best for very small bandwidths.
  • fgt::ifgt uses the Improved Fast Gauss Transform (pdf) to speed up the calculation. IFGT is fast for large bandwidths but can break down for smaller bandwidths.

There's also a class-based interface:

#include <fgt.hpp>

void my_great_function(const Eigen::MatrixXd& x, const Eigen::MatrixXd& y) {
    double bandwidth = 0.3;
    fgt::Direct direct(x, bandwidth);
    Eigen::VectorXd result = direct.compute(y);
}

This lets you break up your transform into a pre-compute and a compute step, which can save you some cycles if you're re-using the same source dataset in a more advanced transform (e.g. direct_tree or ifgt).

There is some benchmarking code available in the bench directory, which you can use to try to get a sense of the performance of the various modes. We found a crossover point at bandwidths of a bit more than 0.2 during local testing on a Mac laptop; YMMV.

Benchmarks conducted on a random dataset on my personal Mac laptop

Installation

fgt has no runtime dependencies, and only depends on CMake and Eigen for building.

Homebrew

If you're on a Mac and you use Homebrew, use my tap to install:

brew tap gadomski/gadomski
brew install fgt

From source

To build fgt from source on a *nix system, clone the repository and execute the traditional CMake build incantation:

git clone https://github.com/gadomski/fgt
mkdir -p fgt/build
cd fgt/build
cmake ..
make

fgt doesn't make any assumptions about whether you do or do not want shared libraries, so if you have a preference be sure to set BUILD_SHARED_LIBS.

If building with MSVC, we recommend:

  • Setting BUILD_SHARED_LIBS=OFF, at least if you want to run the test suite out-of-the-box.
  • Setting gtest_force_shared_crt=ON. This allows an out-of-the-box build without any Visual Studio config mucking.

Eigen ordering

Eigen, by default, stores matrices in column-major order, but fgt works with row-major matrices. If you want to avoid extra copies, pass in row-major matrices to fgt functions. You can use the fgt::Matrix typedef to help:

fgt::Matrix my_matrix(1000, 3); // creates an uninitialized 1000x3 row-major matrix of doubles

OpenMP

fgt comes with built-in OpenMP parallelization, which can lead to some significant speedups for large data sets. To enable OpenMP, make sure you're using an OpenMP-aware compiler (on OSX, you can get OpenMP clang via Homebrew: brew install clang-omp) and set the CMake variable WITH_OPENMP to ON, e.g.:

CC=clang-omp CXX=clang-omp++ cmake .. -DWITH_OPENMP=ON
make

This will build an OpenMP-enabled fgt library.

Tests

fgt comes with a unit-test suite. To run, simply execute make test. You probably want CMAKE_BUILD_TYPE=Release, otherwise the tests will take a while.

Using in a downstream project

When you install fgt on your system, you will also install a few CMake configuration files that make it easy to integrate this project into your downstream work. If fgt is installed to a traditional location (e.g. /usr/local), finding fgt might be as simple as including the following in your CMakeLists.txt:

find_package(Fgt REQUIRED)
target_link_libraries(my-sweet-target
    PUBLIC
    Fgt::Library-C++
    )

The provided target, Fgt::Library-C++, should have its INTERFACE_INCLUDE_DIRECTORIES and other useful properties configured, so you shouldn't have to muck with anything else.

If you've installed fgt to a non-standard location, you may need to use Fgt_DIR to find its CMake configuration files, or use CMAKE_PREFIX_PATH to point CMake at your non-standard install tree.

Versioning

We follow semantic versioning. Versions have annotated tags following a vMAJOR.MINOR.PATCH naming convention. While we'll do our best to increment the MINOR version with all breaking changes, we can't guarantee anything until MAJOR hits 1 (per usual).

Contributing

As always, we welcome bug reports, features requests, and particularly pull requests via Github.

This library was developed by Pete Gadomski, and it was inspired by the IFGT source code and figtree.

License

LGPL v2.1. See LICENSE.txt for the complete text.

About

License:GNU Lesser General Public License v2.1


Languages

Language:C++ 78.9%Language:CMake 15.0%Language:Shell 4.9%Language:R 1.3%