different number of immune.ics depending on runs

Question

different number of immune.ics depending on runs

UrszulaCzerwinska opened this issue 7 years ago · comments

Urszula Czerwinska commented 7 years ago

Sometimes we get different number of immune component candidates (stroma ones don’t always pass the threshold) - possibility: not take them into account for deconvolution

Urszula Czerwinska · Answer 1 · Wed Dec 13 2017 23:51:01 GMT+0800 (China Standard Time)

This problem can be related to threshold or number of genes we include in enrichment. I have remarked "interpretation" difference for some datasets depending on how many "top" genes from ICs were included for enrichment. Sort of sensitivity study is necessary

Urszula Czerwinska · Answer 2 · Fri Dec 15 2017 00:45:46 GMT+0800 (China Standard Time)

So I have run enrichment for BRCA_TCGA data selecting for >0.98 quantile by 0.001 (0.1%) so varying from 420 to 21 genes by 21 genes
The results didn't change at all for Myeloid cell but it definitely changed for T cells /B cells
So what I observed is that for lower thresholds (more genes) T cells are the most enriched till quantile 0.995 and then it switches to B cells with very high probability 8/12 genes
So, what I see is that top driving genes are B cell genes (some B cell highly specific)

Urszula Czerwinska · Answer 3 · Mon Dec 18 2017 18:51:59 GMT+0800 (China Standard Time)

For 3 decompositions of CIT

interpretation for 4 components T cell and Myeloid stable and 2 that change a) from T cells to NK and then back T cells b) from B cells to stroma to T cells to NK cells/T cells depending how many genes are used for enrichment test
interpretation of 3 components T cell, Myeloid and Stroma the IC4 didn't passe enrichment because of p.value correction for any number of genes
interpretation of 4 components T cell and Myeloid stables and a) T cell to NK / B cell and b) Stroma / Myeloid the lowest IC5 did not pass the threshold for enrichment

Maybe we should add some kind of stabilization i.e. decompose several times and keep only stable ones???

Urszula Czerwinska · Answer 4 · Thu Dec 21 2017 01:16:45 GMT+0800 (China Standard Time)

I increased reproducibility by maxitand decreased tolthat increased the repoductibility. We can also cheat and set the seed fixed inside run_fastica

Urszula Czerwinska · Answer 5 · Thu Dec 21 2017 01:51:14 GMT+0800 (China Standard Time)

I have then always 4 components that pass correlation threshold. However, the interpretation slightly differs (less than before). I will try to decrease toland increase maxiteven more

We can see that the lowest correlation between runs concerns 4th component, however, even little change cause consequences in enrichment test

Urszula Czerwinska · Answer 6 · Fri Jan 05 2018 21:40:33 GMT+0800 (China Standard Time)

So this can be mainly settled by ICASSO stabilization.
It is working efficiently in MATLAB. This is why in unofficial version of the package will be possibility to call matlab ica with icasso
I also used Biton MineICA::clusterFastICARuns()function, however, I had problems with MineICA installation as it depends on too many packages... therefore I copied and adapted the funcition.
Testing now how slow it is ...

Urszula Czerwinska · Answer 7 · Fri Jan 05 2018 21:41:05 GMT+0800 (China Standard Time)

Opening another issue for the enrichment test

Urszula Czerwinska · Answer 8 · Sat Jan 06 2018 06:10:32 GMT+0800 (China Standard Time)

res.test.2 <- run_fastica(METABRIC.cen, optimal = TRUE, row.center = TRUE, with.names = FALSE, alg.typ = "parallel", gene.names = row.names(METABRIC.cen), method = "C", n.comp = 100, isLog = TRUE, R = TRUE, stabilize = TRUE, funClus = "hclust", methodClust = "average", nbIt = 100)

Time difference of 5.040406 hours

Urszula Czerwinska · Answer 9 · Sun Jan 07 2018 20:10:16 GMT+0800 (China Standard Time)

this stabilisation results are really different from Matlab ones and from what we can expect

Urszula Czerwinska · Answer 10 · Thu Jan 11 2018 23:47:57 GMT+0800 (China Standard Time)

we can see that the results of matlab and r icasso is not the same as far as partitions are concerned. The weird fact is that R seems to overcluster the stable components which false the results...

Urszula Czerwinska · Answer 11 · Mon Jan 22 2018 20:24:19 GMT+0800 (China Standard Time)

I tried to figure it out. It looks like once I give the distance matrix to R code it works fine; but when I put the distance matrix from R to Matlab it works fine too. I also tried a different R implementation but it didn't work well either in practice.
I call it a day, we will recommend to use MATLAB or Docker