UrszulaCzerwinska / DeconICA

Deconvolution of transcriptome through Immune Component Analysis

Home Page:https://urszulaczerwinska.github.io/DeconICA/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

different number of immune.ics depending on runs

UrszulaCzerwinska opened this issue · comments

Sometimes we get different number of immune component candidates (stroma ones don’t always pass the threshold) - possibility: not take them into account for deconvolution

This problem can be related to threshold or number of genes we include in enrichment. I have remarked "interpretation" difference for some datasets depending on how many "top" genes from ICs were included for enrichment. Sort of sensitivity study is necessary

So I have run enrichment for BRCA_TCGA data selecting for >0.98 quantile by 0.001 (0.1%) so varying from 420 to 21 genes by 21 genes
The results didn't change at all for Myeloid cell but it definitely changed for T cells /B cells
So what I observed is that for lower thresholds (more genes) T cells are the most enriched till quantile 0.995 and then it switches to B cells with very high probability 8/12 genes
So, what I see is that top driving genes are B cell genes (some B cell highly specific)

For 3 decompositions of CIT

  1. interpretation for 4 components T cell and Myeloid stable and 2 that change a) from T cells to NK and then back T cells b) from B cells to stroma to T cells to NK cells/T cells depending how many genes are used for enrichment test
  2. interpretation of 3 components T cell, Myeloid and Stroma the IC4 didn't passe enrichment because of p.value correction for any number of genes
  3. interpretation of 4 components T cell and Myeloid stables and a) T cell to NK / B cell and b) Stroma / Myeloid the lowest IC5 did not pass the threshold for enrichment

Maybe we should add some kind of stabilization i.e. decompose several times and keep only stable ones???

I increased reproducibility by maxitand decreased tolthat increased the repoductibility. We can also cheat and set the seed fixed inside run_fastica

I have then always 4 components that pass correlation threshold. However, the interpretation slightly differs (less than before). I will try to decrease toland increase maxiteven more
corr_immune_runs

We can see that the lowest correlation between runs concerns 4th component, however, even little change cause consequences in enrichment test

So this can be mainly settled by ICASSO stabilization.
It is working efficiently in MATLAB. This is why in unofficial version of the package will be possibility to call matlab ica with icasso
I also used Biton MineICA::clusterFastICARuns()function, however, I had problems with MineICA installation as it depends on too many packages... therefore I copied and adapted the funcition.
Testing now how slow it is ...

Opening another issue for the enrichment test

res.test.2 <- run_fastica(METABRIC.cen, optimal = TRUE, row.center = TRUE, with.names = FALSE, alg.typ = "parallel", gene.names = row.names(METABRIC.cen), method = "C", n.comp = 100, isLog = TRUE, R = TRUE, stabilize = TRUE, funClus = "hclust", methodClust = "average", nbIt = 100)

Time difference of 5.040406 hours

this stabilisation results are really different from Matlab ones and from what we can expect

we can see that the results of matlab and r icasso is not the same as far as partitions are concerned. The weird fact is that R seems to overcluster the stable components which false the results...
parititonr
metabric cen_numerical txt_100_stability

I tried to figure it out. It looks like once I give the distance matrix to R code it works fine; but when I put the distance matrix from R to Matlab it works fine too. I also tried a different R implementation but it didn't work well either in practice.
I call it a day, we will recommend to use MATLAB or Docker