markovmodel / adaptivemd

A python framework to run adaptive Markov state model (MSM) simulation on HPC resources

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

worker job queuing might be inefficient

thempel opened this issue · comments

It might be inefficient to allow a worker to take more than one job at once, at least if a single job takes longer than a certain time. For example, I just submitted four trajectories to four workers. One worker has grabbed two jobs, basically making another worker unemployed. Would it be more efficient to restrict each worker to one single job? @nsplattner

Correct, that was just a proof of concept and was legacy from using RP and is common for worker schemes, where connections might not be reliable. You want to make sure that you do not have dead times, so you prefetch tasks to keep running.

But I agree to set the default to 1.

I think pre-fetching of jobs is only meaningful if we assume that the workers don't have continuous access to the database, so this is not the normal case.

In the case of database downtimes pre-fetching may help to keep the workers busy, but only if they don't crash at the end of the first job in case they are unable to reach the database (in order to communicate the results of the first job). Is this currently the case?

In the case of database downtimes pre-fetching may help to keep the workers busy, but only if they don't crash at the end of the first job in case they are unable to reach the database (in order to communicate the results of the first job). Is this currently the case?

Right. In the case of our worker is practically does not make sense. To be compatible with RP we might need this anyway, but if you agree I will erase the option for the adaptivemdworker.

I think this is a good solution for now.

Done. After an update it should use prefetch=1