Danko-Lab / TED

a fully Bayesian approach to deconvolve tumor microenvironment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seed argument for publication reproduciblity

t-carroll opened this issue · comments

Hi Tinyi,

I've had some issues generating reproducible results in preparation for publishing code, namely that I can't same to reproduce the same result, even when setting the seed explicitly. For instance, running the same call with the same seed on some previously published data from Maag et al. shows two different results:

set.seed(42)
ted.maag = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA")
set.seed(42)
ted.maag2 = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA")

all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta)

[1] "Mean relative difference: 0.001463005"

all.equal(ted.maag$res$final.gibbs.theta[,"EAC"],ted.maag2$res$final.gibbs.theta[,"EAC"])

[1] "Mean relative difference: 0.0006727291"

So it looks like setting the seed for the global R environment is insufficient, perhaps due to some quirk of the multicore parallelism (I'm doing this in Rstudio on a CentOS Linux HPC, if relevant). Is there a way to set the seed internally for whichever internal functions are sampling in order to try and enable reproducible results? If so, perhaps an optional seed argument could be passed to run.Ted(), which could then in turn be passed to these internal sampling functions. Do you think that sort of thing be feasible? Happy to try and help out if so (but not as familiar with the workings of these internal fucntions)

Hi Tinyi,

Great, thanks for the quick upgrade! I tried it out and all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta) now equals TRUE when setting the same seed in both calls. Now closing.

-Tom