EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using MeanShift with given bandwidth?

brainbytes42 opened this issue · comments

Maybe I'm missing something - but is it possible tu use MeanShift-Clustering with a given (fixed) bandwidth?

As far as I follow the implementation, even if I provide the KDE with a set bandwidth to the MeanShift-Constructor, it gets overwritten in MeanShift's cluster-method by calling mkde.setUsingData(dataSet, parallel);:

@Override
public int[] cluster(DataSet dataSet, boolean parallel, int[] designations)
{
    // ...
    
    final KernelFunction k = mkde.getKernelFunction();
    mkde.setUsingData(dataSet, parallel);
    mkde.scaleBandwidth(scaleBandwidthFactor);
        
    // ...
}

Scaling the bandwidth seems not sufficient for me, as the scaled bandwidth isn't fixed. But as this is done inside the cluster-step, there seems to be no way to intercept or re-set the bandwidth...

A simple example how I've tried to use MeanShift:

SimpleDataSet dataSet = ...
double sigma = ...
MetricKDE metricKDE = new MetricKDE(GaussKF.getInstance(), new EuclideanDistance());
metricKDE.setBandwith(sigma); // <-- gets ignored!
MeanShift meanShift = new MeanShift(metricKDE);
List<List<DataPoint>> clusters = meanShift.cluster(dataSet); // <-- cluster trigger's bandwidth-estimation

Actually, it's obvious, that the Kernel-Density-Estimation wants to estimate the bandwidth, but in my case, I need a consistent bandwidth for multiple runs and need 'only' the clustering-step for the data.

Any help appreciated - thank you.