parklab / MuSiCal

A comprehensive toolkit for mutational signature analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve parallelization of DenovoSig

Hu-JIN opened this issue · comments

The current parallelization of DenovoSig using multiprocessing works fine. Note that it is important to set sbatch -n to be ncpu in DenovoSig, instead of sbatch -c.

When sbatch -n is set to be ncpu in DenovoSig (with -c set to 1), I checked that the running time of each job performed by each worker is almost the same as that of a serial job. So we can almost get ncpu times speedup.

When sbatch -c is set to be ncpu in DenovoSig (with -n set to 1), then the running time of each job performed by each worker is about ~ncpu times slower than that of a serial job. As a result, we don't get any speedup.

So it is important to use sbatch -n instead of sbatch -c. That is contradictory to what I understood about sbatch before though. We need to understand more about this behavior.

Note that the results stated above are the same for NMF and mvNMF.

There could be potential improvements of our parallelization scheme. It is possible that some time is spent on pickling objects. See https://thelaziestprogrammer.com/python/a-multiprocessing-pool-pickle. Although, from the result above, there does not seem to be much overhead in our current code.