update (17 Feb., 2021)

use amap::Dist in groupPredict to speed up.

Install package by navigating to the parent folder of this one and running

R CMD INSTALL SNFtool

After the installation is complete you can use the functions. Here is an example session.

First, set all the parameters:

K = 20; # number of neighbors, usually (10~~30) alpha = 0.5; # hyperparameter, usually (0.3~~0.8) T = 10; # Number of Iterations, usually (10~20)

Data1 is of size n x d_1, where n is the number of patients, d_1 is the number of genes, e.g.

Data2 is of size n x d_2, where n is the number of patients, d_2 is the number of methylation, e.g.

data(Data1) data(Data2)

Here, the simulation data (Data1, Data2) has two data types. They are complementary to each other. And two data types have the same number of points. The first half data belongs to the first cluster; the rest belongs to the second cluster.

truelabel = c(matrix(1,100,1),matrix(2,100,1)); ##the ground truth of the simulated data;

Calculate distance matrices(here we calculate Euclidean Distance, you can use other distance, e.g,correlation)

If the data are all continuous values, we recommend the users to perform standard normalization before using SNF, though it is optional depending on the data the users want to use.

Data1 = standardNormalization(Data1);

Data2 = standardNormalization(Data2);

Calculate the pair-wise distance; If the data is continuous, we recommend to use the function "dist2" as follows; if the data is discrete, we recommend the users to use ""

Dist1 = dist2(as.matrix(Data1),as.matrix(Data1)); Dist2 = dist2(as.matrix(Data2),as.matrix(Data2));

next, construct similarity graphs

W1 = affinityMatrix(Dist1, K, alpha) W2 = affinityMatrix(Dist2, K, alpha)

These similarity graphs have complementary information about clusters.

displayClusters(W1,truelabel); displayClusters(W2,truelabel);

next, we fuse all the graphs

then the overall matrix can be computed by similarity network fusion(SNF):

W = SNF(list(W1,W2), K, T)

With this unified graph W of size n x n, you can do either spectral clustering or Kernel NMF. If you need help with further clustering, please let us know.

for example, spectral clustering

C = 2 # number of clusters group = spectralClustering(W, C); # the final subtypes information

you can evaluate the goodness of the obtained clustering results by calculate Normalized mutual information (NMI): if NMI is close to 1, it indicates that the obtained clustering is very close to the "true" cluster information; if NMI is close to 0, it indicates the obtained clustering is not similar to the "true" cluster information.

displayClusters(W, group); SNFNMI = calNMI(group, truelabel)

you can also find the concordance between each individual network and the fused network

ConcordanceMatrix = concordanceNetworkNMI(list(W, W1,W2));

################################################################################

We also provide an example using label propagation to predict the labels of new data points below.

How to use SNF with multiple views

Load views into list "dataL"

load("Digits.RData")

data(Digits)

Set the other parameters

K = 20 # number of neighbours alpha = 0.5 # hyperparameter in affinityMatrix T = 20 # number of iterations of SNF

Normalize the features in each of the views (optional)

dataL = lapply(dataL, standardNormalization)

Calculate the distances for each view

distL = lapply(dataL, function(x) dist2(x, x))

Construct the similarity graphs

affinityL = lapply(distL, function(x) affinityMatrix(x, K, alpha)) ################################################################################

An example of how to use concordanceNetworkNMI

Concordance_matrix = concordanceNetworkNMI(affinityL, 3);

The output, Concordance_matrix, shows the concordance between the fused network and each individual network.

################################################################################

Example of how to use SNF to perform subtyping

Construct the fused network

W = SNF(affinityL, K, T)

perform clustering on the fused network.

clustering = spectralClustering(W,3);

use NMI to measure the goodness of the obtained labels.

NMI = calNMI(clustering, label);

################################################################################

Provide an example of predicting the new labels with label propagation

Load views into list "dataL" and the cluster assignment into vector "label"

data(Digits)

Create the training and test data

n = floor(0.8*length(label)) # number of training cases trainSample = sample.int(length(label), n) train = lapply(dataL, function(x) x[trainSample, ]) # Use the first 150 samples for training test = lapply(dataL, function(x) x[-trainSample, ]) # Test the rest of the data set groups = label[trainSample]

Set the other

K = 20 alpha = 0.5 t = 20 method = TRUE

Apply the prediction function to the data

newLabel = groupPredict(train,test,groups,K,alpha,t,method)

Compare the prediction accuracy

accuracy = sum(label[-trainSample] == newLabel[-c(1:n)])/(length(label) - n)

################################################################################

References:

B Wang, A Mezlini, F Demir, M Fiume, T Zu, M Brudno, B Haibe-Kains, A Goldenberg (2014) Similarity Network Fusion: a fast and effective method to aggregate multiple data types on a genome wide scale. Nature Methods. Online. Jan 26, 2014

Website: http://compbio.cs.toronto.edu/SNF/SNF/Software.html

update (17 Feb., 2021)

First, set all the parameters:

Data1 is of size n x d_1, where n is the number of patients, d_1 is the number of genes, e.g.

Data2 is of size n x d_2, where n is the number of patients, d_2 is the number of methylation, e.g.

Here, the simulation data (Data1, Data2) has two data types. They are complementary to each other. And two data types have the same number of points. The first half data belongs to the first cluster; the rest belongs to the second cluster.

Calculate distance matrices(here we calculate Euclidean Distance, you can use other distance, e.g,correlation)

If the data are all continuous values, we recommend the users to perform standard normalization before using SNF, though it is optional depending on the data the users want to use.

Data1 = standardNormalization(Data1);

Data2 = standardNormalization(Data2);

Calculate the pair-wise distance; If the data is continuous, we recommend to use the function "dist2" as follows; if the data is discrete, we recommend the users to use ""

next, construct similarity graphs

These similarity graphs have complementary information about clusters.

next, we fuse all the graphs

then the overall matrix can be computed by similarity network fusion(SNF):

With this unified graph W of size n x n, you can do either spectral clustering or Kernel NMF. If you need help with further clustering, please let us know.

for example, spectral clustering

you can also find the concordance between each individual network and the fused network

We also provide an example using label propagation to predict the labels of new data points below.

How to use SNF with multiple views

Load views into list "dataL"

load("Digits.RData")

Set the other parameters

Normalize the features in each of the views (optional)

dataL = lapply(dataL, standardNormalization)

Calculate the distances for each view

Construct the similarity graphs

An example of how to use concordanceNetworkNMI

The output, Concordance_matrix, shows the concordance between the fused network and each individual network.

Example of how to use SNF to perform subtyping

Construct the fused network

perform clustering on the fused network.

use NMI to measure the goodness of the obtained labels.

Provide an example of predicting the new labels with label propagation

Load views into list "dataL" and the cluster assignment into vector "label"

Create the training and test data

Set the other

Apply the prediction function to the data

Compare the prediction accuracy

References:

B Wang, A Mezlini, F Demir, M Fiume, T Zu, M Brudno, B Haibe-Kains, A Goldenberg (2014) Similarity Network Fusion: a fast and effective method to aggregate multiple data types on a genome wide scale. Nature Methods. Online. Jan 26, 2014

Website: http://compbio.cs.toronto.edu/SNF/SNF/Software.html

About

Languages