Floessie / clutbench

Small benchmark for HaldCLUT optimizations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clutbench - A small benchmark framework for HaldCLUT optimizations

HaldCLUT is an algorithm and CLUT storage technique envisioned by Eskil Steenberg that was integrated into RawTherapee 4.2 to enable film simulation. The code was taken "as is" from Steenberg's HaldCLUT example and slightly adapted to RawTherapee's internal ImageFloat format using floats in the range from 0.0 to 65535.0. Little was done to optimize the code ...

Approach

clutbench is a simple testbed for the HaldCLUT algorithm. It takes two images as P6 portable anymaps, the input image first, the CLUT image second, and applies the algorithm multiple times (fourth argument, default is 10) in different implementations. Execution time is measured and compared to the first implementation, which happens to be the one implemented in RawTherapee 4.2.

For each implementation the resulting image is written as a 16 bit PPM with a prefix supplied as the third argument. Additionally, clutbench displays the absolute difference between original and current implementation, as well as the maximum difference per channel in the 0.0 to 65535.0 range.

Compile

clutbench is meant to be used with GCC under Linux. To build it, you need to have cmake installed:

clutbench$ mkdir build
clutbench$ cd build
clutbench/build$ cmake ..
clutbench/build$ make -j

Usage

You will have to prepare two images: The image to apply the HaldCLUT on and the HaldCLUT itself. Sample HaldCLUTs can be downloaded either from the HaldCLUT page or from RawTherapee (warning: big!). Convert the files with netpbm:

clutbench/build$ pngtopnm [...]/HaldCLUT/CLUTgallery/Desat_dark.HCLUT.png > clut.ppm
clutbench/build$ jpegtopnm [...]/image.jpg > image.ppm

Now run the benchmark:

clutbench/build$ ./clutbench image.ppm clut.ppm test
Method:     Original RawTherapee 4.2 (adapted implementation)
Time:       1137ms

Method:     Optimized and cleaned up code
Time:       1060ms
Speedup:    1.07264 (7.26415% faster)
Difference: 0 (Rmax 0, Gmax 0, Bmax 0)

Method:     4 * 16b integer clut storage (8B per pixel instead of 12B)
Time:       429ms
Speedup:    2.65035 (165.035% faster)
Difference: 0 (Rmax 0, Gmax 0, Bmax 0)

Method:     Integer clut storage with SSE optimization
Time:       383ms
Speedup:    2.96867 (196.867% faster)
Difference: 0 (Rmax 0, Gmax 0, Bmax 0)

The relative results will largely depend on the CPU microarchitecture, the compiler version and flags, and memory bandwidth of your system. They do not (or only marginally) depend on the image and HaldCLUT size.

Extend

To extend clutbench with your own implementation, take for example IntegerClutMethod.[hc]pp, rename it to your liking and change the setClut() and convert() methods.

Don't forget to add the new CPP file in CMakeLists.txt and the new class to Application.cpp:

[...]
#include "OriginalClutMethod.hpp"
#include "OptimizedClutMethod.hpp"
#include "IntegerClutMethod.hpp"
#include "SseClutMethod.hpp"
// <-- Include your header here

namespace
{

    std::vector<ClutMethod*> createClutMethods()
    {
        std::vector<ClutMethod*> clut_methods;
        clut_methods.push_back(new OriginalClutMethod);
        clut_methods.push_back(new OptimizedClutMethod);
        clut_methods.push_back(new IntegerClutMethod);
        clut_methods.push_back(new SseClutMethod);
        // <-- Add your implementation here
        return clut_methods;
    }
    [...]
}
[...]

Deviation from the RawTherapee 4.2 implementation

The ClutMethod interface is of course different from the one used in RawTherapee 4.2. While the pixel components in RT are given piecewise to rtengine::CLUT::getRGB() and are returned piecewise as well in different variables, clutbench employs a single four element float array for input and output. Furthermore this array is also aligned on a 16 byte boundary, making it easy to gain speed through aligned loads. Such a change would be feasible for RawTherapee, too.

About

Small benchmark for HaldCLUT optimizations

License:GNU General Public License v3.0


Languages

Language:C++ 100.0%