peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Home Page:https://peterwittek.github.io/somoclu/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Misleading `Somoclu.get_bmus()` result

giacomolanciano opened this issue · comments

Hi,

I'd like to point out a possible source of confusion in the implementation of Somoclu.get_bmus(). Here's an example output:

array([[3, 1],
       [3, 3],
       [3, 3],
       ...,
       [3, 0],
       [2, 0],
       [0, 1]])

The BMUs indices are returned as a 2D numpy array, each row indicating the BMU for the corresponding sample. Individual BMU indices are 1D numpy arrays, represented in a [column, row] format. Such format totally makes sense if you look at the map as a cartesian plan (I'm guessing this is your interpretation because of the naming convention used in the source code: X and Y over col and row, respectively).

Although, such design choice is not documented anywhere and, as in my case, it could be a source of problems if one uses the returned BMUs indices to access the corresponding map's cells (i.e., columns are treated as rows and vice-versa). If it was explicitly documented, one could have known that, for instance, a numpy.flip(..., axis=1) is needed to transform the result in advance.

Possibly, the method could be improved by adding an order parameter (in line with what numpy does) to let the user pick the desired format. The method's signature could be something like

def get_bmus(self, activation_map, order='F'):
    ...

with order accepting 'F' for column-major order (i.e., the current default) and 'C' for row-major order. In this way, referring to the example above, setting order='C' would produce the following output instead:

array([[1, 3],
       [3, 3],
       [3, 3],
       ...,
       [0, 3],
       [0, 2],
       [1, 0]])

Thanks for your consideration and for the effort put in developing this great package, I've found it very useful to produce the source code for a recent publication of mine.

Thanks. Could you check if it is the simple T or not T implemented in the above commit?

Hi @xgdgsc,

thanks for following up on this. I have slightly updated the issue description to better explain what is my expected result.

In case of order='C', you should still transpose after inverting the order of X and Y, i.e.:

if order == 'F':
    return np.vstack((X, Y)).T
elif order == 'C':
    return np.vstack((Y, X)).T

For convenience, I have also implemented the fix in PR #146.

Thanks!