mlampros / ClusterR

Gaussian mixture models, k-means, mini-batch-kmeans and k-medoids clustering

Home Page:https://mlampros.github.io/ClusterR/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallelised Kmeans_rcpp with OpenMP

alazarolop opened this issue · comments

Hi,

After following your advice, I was able to build the package with OpenMP support. So far, Clara_Medoids and all the functions with the variable threads defined parallelize its computation, which is awesome!.

Reading your vignette, I found this information for Kmeans_rcpp and parallelization.

It allows for multiple initializations (which can be parallelized if Openmp is available)

But I can't get it working. If I also try the threads variable, it's said it's unused.

How could I get Kmeans_rcpp and MiniBatchKmeans to run its initializations in parallel?

Thank you in advance and congratulations for a great package.

Hi @alazarolop,

I'm glad you find ClusterR helpful for your tasks.

I totally forgot to remove this line from the documentation when in version 1.0.8 I removed the threads parameter from the Kmeans_rcpp function.
Parallelization is feasible only if the threads parameter is present in the parameter setting of a function.
On the other hand you must know that the Armadillo library which the ClusterR utilizes through the RcppArmadillo package uses OpenMP internally too, but on the contrary, it automatically optimizes the execution of code (where ever possible) if the Operating System of the user supports OpenMP and is enabled (you can find more information in the documentation of the Armadillo library).

Thanks for making me aware of the mistake in the documentation. I'll upload an updated version of the package on Github and I'll fix it in the next version on CRAN.

Hi @mlampros , no worries, you're welcome. Thank you for the quick fix.

That's what I thought and I even tried to use the parameter threads (which just raised an error).

I don't know much about Armadillo to be honest, but I understand what you mean. Could it be possible to force the function to parallelize it? I've got the impression my installation of Armadillo (from Homebrew) it's not using OpenMP, because even with high dimension matrix it just run in a single core.

HI @alazarolop,

the Kmeans_rcpp function will run on a single core, there is no possibility to force the parallelization of the initializations. On the other hand the Armadillo functions that the Kmeans_rcpp calls internally will be parallelized if OpenMP is enabled for these internal functions.

Ah ok, that was exactly my thought.

Thank you a lot for the explanation and thank you again for your effort on the package.