jhandl / hist4j

High performance, memory-limited adaptive histogram class.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

Hist4J is a simple high-performace value aggregator that accepts large datasets with any distribution or range and provides several statistical functions, using a very small memory footprint and requiring no pre- or post-processing.

Hist4j has the following features:

  • It adapts to any data distribution, keeping a more or less constant resolution throughout the data range by increasing the resolution where the data is more dense.
  • It can process large amounts of data with a very small memory footprint.
  • It doesn't need pre- or post-processing to deliver statistics about the data seen so far.

The following statistics are currently available:

  • The cumulative density function for a given data point.
  • The data point that splits the data set at a given percentile.

Installation

The default ant target will generate hist4j-trunk.jar.

Licence

Apache 2.0

Usage

A typical use case is mesuring response times of a service. In this case, we can create a histogram object:

AdaptiveHistogram h = new Histogram();

Then, for every service response, we log the time:

h.addValue(elapsedTime);

On fixed intervals (maybe a few minutes, maybe a few hours), recreate the histogram object, and print the information of the old one:

hOld = h;
h = new Histogram();
//print hOld.getValueForPercentile(95)

Example

In the root project directory, the file Hist4jExample.java shows a minimal but useful example. Just run ./run_example.sh to compile and run.

About

High performance, memory-limited adaptive histogram class.