Seed argument for publication reproduciblity

Question

Seed argument for publication reproduciblity

t-carroll opened this issue 2 years ago · comments

Hi Tinyi,

I've had some issues generating reproducible results in preparation for publishing code, namely that I can't same to reproduce the same result, even when setting the seed explicitly. For instance, running the same call with the same seed on some previously published data from Maag et al. shows two different results:

set.seed(42)
ted.maag = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA")
set.seed(42)
ted.maag2 = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA")

all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta)

[1] "Mean relative difference: 0.001463005"

all.equal(ted.maag$res$final.gibbs.theta[,"EAC"],ted.maag2$res$final.gibbs.theta[,"EAC"])

[1] "Mean relative difference: 0.0006727291"

So it looks like setting the seed for the global R environment is insufficient, perhaps due to some quirk of the multicore parallelism (I'm doing this in Rstudio on a CentOS Linux HPC, if relevant). Is there a way to set the seed internally for whichever internal functions are sampling in order to try and enable reproducible results? If so, perhaps an optional seed argument could be passed to run.Ted(), which could then in turn be passed to these internal sampling functions. Do you think that sort of thing be feasible? Happy to try and help out if so (but not as familiar with the workings of these internal fucntions)

Tinyi Chu · Answer 1 · Tue Apr 12 2022 05:58:11 GMT+0800 (China Standard Time)

Hi Tom, I have updated the github to address the issue of reproducibility. Simply reinstall the package and add seed=your seed number in the run.Ted argument. Please let me know if you have any questions. Best, Tinyi

…

On Sat, Apr 9, 2022 at 6:53 PM Tom Carroll ***@***.***> wrote: Hi Tinyi, I've had some issues generating reproducible results in preparation for publishing code, namely that I can't same to reproduce the same result, even when setting the seed explicitly. For instance, running the same call with the same seed on some previously published data from Maag et al. shows two different results: set.seed(42) ted.maag = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA") set.seed(42) ted.maag2 = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA") all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta) [1] "Mean relative difference: 0.001463005" all.equal(ted.maag$res$final.gibbs.theta[,"EAC"],ted.maag2$res$final.gibbs.theta[,"EAC"]) [1] "Mean relative difference: 0.0006727291" So it looks like setting the seed for the global R environment is insufficient, perhaps due to some quirk of the multicore parallelism (I'm doing this in Rstudio on a CentOS Linux HPC, if relevant). Is there a way to set the seed interanlly for whichever internal functions are sampling in order to try and enable reproducible results? If so, perhaps an optional seed argument could be passed to run.Ted(), which could then in turn be passed to these internal sampling functions. Do you think that sort of thing be feasible? Happy to try and help out if so (but not as familiar with the workings of these internal fucntions) — Reply to this email directly, view it on GitHub <#17>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4NHSYIJ7TPX6ZRIQPYJILVEIC6LANCNFSM5S75HEPA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Tom Carroll · Answer 2 · Tue Apr 12 2022 23:03:30 GMT+0800 (China Standard Time)

Hi Tinyi,

Great, thanks for the quick upgrade! I tried it out and all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta) now equals TRUE when setting the same seed in both calls. Now closing.

-Tom