Using MeanShift with given bandwidth?
brainbytes42 opened this issue · comments
Maybe I'm missing something - but is it possible tu use MeanShift-Clustering with a given (fixed) bandwidth?
As far as I follow the implementation, even if I provide the KDE with a set bandwidth to the MeanShift-Constructor, it gets overwritten in MeanShift's cluster-method by calling mkde.setUsingData(dataSet, parallel);
:
@Override
public int[] cluster(DataSet dataSet, boolean parallel, int[] designations)
{
// ...
final KernelFunction k = mkde.getKernelFunction();
mkde.setUsingData(dataSet, parallel);
mkde.scaleBandwidth(scaleBandwidthFactor);
// ...
}
Scaling the bandwidth seems not sufficient for me, as the scaled bandwidth isn't fixed. But as this is done inside the cluster-step, there seems to be no way to intercept or re-set the bandwidth...
A simple example how I've tried to use MeanShift:
SimpleDataSet dataSet = ...
double sigma = ...
MetricKDE metricKDE = new MetricKDE(GaussKF.getInstance(), new EuclideanDistance());
metricKDE.setBandwith(sigma); // <-- gets ignored!
MeanShift meanShift = new MeanShift(metricKDE);
List<List<DataPoint>> clusters = meanShift.cluster(dataSet); // <-- cluster trigger's bandwidth-estimation
Actually, it's obvious, that the Kernel-Density-Estimation wants to estimate the bandwidth, but in my case, I need a consistent bandwidth for multiple runs and need 'only' the clustering-step for the data.
Any help appreciated - thank you.