hadjiprocopis / histocurse

A Java implementation of a multidimensional histogram backed on dense/conventional OR sparse array. Extremely efficient when number of dimensions is large and back-store is sparse array. This module depends on other projects which can be found on my repo here. See README below to see what you need to download.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

histocurse

author: andreas hadjiprocopis (andreashad2@gmail.com)

This is a module implementing a Histogram in N-dimensions. Meaning that bins do not exist in a one-dimensional space as they usually do but can be in N-dimensions.

N can be large. In the case of large N, it is better to use the back-store.

This module depends on the following modules in this repository:
containers (https://github.com/hadjiprocopis/containers) statistics (https://github.com/hadjiprocopis/statistics) cartesian (https://github.com/hadjiprocopis/cartesian) hadjiprocopis_utils (https://github.com/hadjiprocopis/hadjiprocopis_utils)

Here is an example on how to use this module:

import  java.util.Random;
import  java.util.HashMap;
import  java.util.ArrayList;
import  java.io.PrintWriter;

import  ahp.org.Histograms.*;
import  ahp.org.Containers.*;

public class TestHistograms_SparseArray {
public static void main(String args[]) throws Exception {
	// Here we create labels for each bin in the histogram.
	// We are going to create 2-dimensional histogram with
	// 2 bins in each dimension.
	// Labels to bins are optional.
	String  labels[/*dims*/][/*num bins/labels per dim*/] =
		new String[][]{
			new String[]{"a", "b"},
			new String[]{"a", "b"},
		}
	;

	// Here we create the 2-dimensional histogram
	Histogram ahist = new Histogram<SparseArray<Histobin>>(
		// histogram name
		"a histogram1",
		// labels (a string array) if any, otherwise will create default
		labels,
		// specify the number of dimensions and the number of bins in each dimension
		new int[]{2, 2}, // numbins per dim
		// bin widths for each dimension
		new double[]{1,1}, // bin widths
		// bins start from 0 in both dimensions
		new double[]{0,0}, // boundaries start
		// Some magic: specify what backing store you want
		// can also be DenseArray.class
		// from: http://stackoverflow.com/questions/37231043/how-to-keep-generic-type-of-nested-generics-with-class-tokens
		(Class<SparseArray<Histobin>> )(Class<?> )SparseArray.class
	);

	// and here we are adding data to the histogram
	// this says increment the count of the bin with this LABEL
	ahist.increment_bin_count(new String[]{"a", "a"});
	// whereas this one finds the bin given coordinates of the data
	// e.g. data (0.1,0.3) falls into the bin (0,0) (given that they
	// start from 0 and have a width of 1 (see above)
	ahist.increment_bin_count(new double[]{0.1, 0.3});
	ahist.increment_bin_count(new double[]{0.1, 0.3});
	ahist.increment_bin_count(new double[]{0.1, 0.3});
	ahist.increment_bin_count(new double[]{0.1, 0.3});
	ahist.increment_bin_count(new double[]{0.1, 0.3});
	ahist.increment_bin_count(new double[]{0.2, 0.5});
	ahist.increment_bin_count(new double[]{0.3, 1.2});
	ahist.increment_bin_count(new double[]{0.4, 1.5});
	ahist.increment_bin_count(new double[]{1.1, 0.1});
	ahist.increment_bin_count(new double[]{1.2, 1.4});

	System.out.println("*******************");
	System.out.println("Histogram with preset-labels: "+ahist);

	// here we decrement bin counts
	System.out.println("******************* DECREMENTING ");
	ahist.decrement_bin_count(new double[]{1.2, 1.4});
	ahist.decrement_bin_count(new double[]{0.2, 0.2});
	ahist.decrement_bin_count(new double[]{0.2, 0.2});
	ahist.decrement_bin_count(new double[]{0.2, 0.2});
	ahist.decrement_bin_count(new double[]{0.2, 0.2});
	System.out.println("after decrement Histogram with preset-labels: "+ahist);

	// Here we are sampling from the histogram
	// basically we are asking for the content of a bin
	// it will return a Histobin object
	Histobin abin;
	// select by a coordinate inside a bin:
	abin = ahist.get_bin(new double[]{0.2, 0.2});
	System.out.println("Selected bin: "+abin);
	// select by bin-label
	abin = ahist.get_bin(new String[]{"a", "b"});
	System.out.println("Selected bin: "+abin);
	// select by bin-coordinate (integer)
	abin = ahist.get_bin(new int[]{0, 1});
	System.out.println("Selected bin: "+abin);
	
	// This is where the magic happens
	// we are asking for bins selected by a wildcard: first dim to be 'a' and second dim any (*)
	// (courtesy of the CARTESIAN module)
	Random arng = new Random(1234);
	HistogramSampler asampler = ahist.sampler(arng);
	// get all bins whose first coordinate is 'a' and second is anything (*)
	asampler.select_bins_using_wildcard_labels(new String[]{"a", "*"});
	// and print those bins:
	System.out.println("random bin: with '0-*' : "+asampler.random_bin_from_selection());
}
}

The above example can be run using

ant clean && ant && ant TestHistograms_SparseArray

There is more magic to this module which will be documented in due time.

author: andreas hadjiprocopis (andreashad2@gmail.com)

About

A Java implementation of a multidimensional histogram backed on dense/conventional OR sparse array. Extremely efficient when number of dimensions is large and back-store is sparse array. This module depends on other projects which can be found on my repo here. See README below to see what you need to download.


Languages

Language:Java 99.9%Language:Shell 0.1%