Could ggpubr not set random seed

Question

Could ggpubr not set random seed

biobee opened this issue 5 years ago · comments

Hi Alboukadel,

ggboxplot set a seed, even we do not add any points or jitter. Seeds are set in:
ggmaplot.R
ggadd.R
utilities.R

For ggboxplot, the seed is set 'globally' in ggadd. Could the jitter-related settings only be executed when jitter is asked for? The seed is set irrespective of the setting for "add".

In detail in the ggadd function:
lines 87-91 set up jitter rrespective of the setting for "add".
Could lines 87-91 be placed after:
if ( "jitter" %in% add ){ on line 115 ?

For transparency, I would propose that setting the seeds in the different functions is done by the user, via the function call at the user end.

Alboukadel KASSAMBARA · Answer 1 · Tue Aug 04 2020 14:41:29 GMT+0800 (China Standard Time)

Somehing like this could be used to set the seed specified by the user:

  if (!is.na(seed)) {
      new_seed <- sample(.Machine$integer.max, 1L)
      set.seed(seed)
      on.exit(set.seed(new_seed))
    }

Terry Jones · Answer 2 · Fri Nov 19 2021 23:37:08 GMT+0800 (China Standard Time)

Hi all. This would be very useful. I just spent an entire day trying to figure out why some code was producing identical samples from rnorm. In the end I narrowed it down to a single line: calling ggboxplot so came looking into the source.

I'm not at all familiar with R (this is the first code I ever wrote with it, actually - a Shiny app) so the following may be wrong. But I'd suggest:

This should at least be documented (maybe it is)
The ggplot functions could simply not call set.seed or only do so when being run under a test suite
The ggplot functions could save the value of .Random.seed, set their own seed, then restore the original setting.

That's all I can think of for now. Thanks very much for all the efforts and the open source code! :-) I'm of course happy to help test things. Or if you think one of these suggestions is worth implementing, I could have a go and send a pr.

Terry Jones · Answer 3 · Sat Nov 20 2021 06:50:03 GMT+0800 (China Standard Time)

I just wrote and deployed a small Shiny app to illustrate the issue:
https://terrycojones.shinyapps.io/ggboxplot-random-seed-demo/

Terry Jones · Answer 4 · Mon Nov 22 2021 16:03:56 GMT+0800 (China Standard Time)

Also note that set.seed(sample(.Machine$integer.max, 1L)) does not solve the issue because the values coming back from sample are dependent on the system RNG, whose state is being reset by the call to ggboxplot. So calling set.seed in that way after a call to ggboxplot also results in a duplicated stream of random numbers.

Terry Jones · Answer 5 · Tue Nov 23 2021 18:19:08 GMT+0800 (China Standard Time)

Sorry to send so many messages, but this is a really serious problem. I just realized that running a Bayesian (stan) sampling is also affected by this. It takes a seed argument that defaults to sample.int(.Machine$integer.max, 1). That means that any code calling ggboxplot and then running Bayesian sampling will just repeat the exact same analysis. See https://mc-stan.org/rstan/reference/stanmodel-method-sampling.html

This issue will affect any code that's using R's regular RNG methods. It's in general not easy to know if you might be calling such a function (or one that calls such a function). You only find out if you're lucky and happen to notice identical behavior on runs that should have different results.

Terry Jones · Answer 6 · Sun Feb 06 2022 02:15:01 GMT+0800 (China Standard Time)

I don't understand why this hasn't received any attention..... Setting the global R random number generator seed to a constant value has enormous implications for anyone doing any kind of stochastic processing. Anyone doing that who happens to be using this code is going to have silently invalidated results. How can this just be ignored?

Alboukadel KASSAMBARA · Answer 7 · Thu Nov 24 2022 07:50:12 GMT+0800 (China Standard Time)

TODO:

Make the option seed available
record and restore random state (example: https://rdrr.io/cran/DHARMa/man/getRandomState.html)

Alboukadel KASSAMBARA · Answer 8 · Sun Nov 27 2022 01:17:48 GMT+0800 (China Standard Time)

fixed now, thanks