hvasbath / beat

Bayesian Earthquake Analysis Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

some thoughts about "chunksize" in iter_parallel_chains function of beat/sampler/base.py

ranneylxr opened this issue · comments

Hi again,
In iter_parallel_chains function of beat/sampler/base.py:476-482

        if chunksize is None:
            if draws < 10:
                chunksize = int(np.ceil(float(n_chains) / n_jobs))
            elif draws > 10 and tps < 0.5:
                chunksize = int(np.ceil(float(n_chains) / n_jobs))
            else:
                chunksize = n_jobs

the tps seems to depend on hardware(I have installed libamdm), and if we set a bigger n_jobs, the chunksize will also be bigger when case tps > 0.5 and draws > 10 and stage > 0.

Refering https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map, the bigger chunksize leads to the smaller chunks count. when n_job > chunks count, the bigger n_job will decrease the number of parallels, which means the calculation time gets longer.

Is it correct? And can I set a arbitory chunksize in script manually?
Thank you!

Hi again,

cool that you are still around ;) .
You are right. The intention behind that is, if your forward model takes a long time, you want to rather use a small chunksize, i.e. having the work distributed in smaller chunks to more workers, otherwise it often happens you have a single worker left with a big chunk of work, that all the other workers are waiting for to be finished until entering the next stage.
Vice versa if you have a fast forward modell you want to have a big chunk-size, because initialising the worker then takes longer than the sampling itself.
Is that understandable? Now I couldnt completely understand what your problem with that setup is. For now you cannot define chunksize in the config file, but if it would help you- we can surely add that- it is not a big deal.

Cheers!

I understand it!
Thank you for explaining.

Best regards.

Sorry for the late fixing, but I apparently didnt get the point correctly until I tried myself with larger number of chains.
It is fixed in the current dev branch here: #121 and should be released to master soon.