mesos / mesos-distcc

Distcc framework for Mesos.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Framework holding onto offers and blocking cluster

ianburrell opened this issue · comments

Resources on the Mesos console show this when running mesos-distcc with "-j8" and 16 CPU cluster (and one other task running).

Total 16 121.4 GB
Used 9 6.0 GB
Offered 7 115.4 GB
Idle 0 0 B

mesos-distcc is using 8 CPU as expected, but holding onto 7 offered CPUs and blocking use of cluster by other users (including other mesos-distcc runs).

One problem is that the declineOffer when tasks have already been started does "return". Any remaining offers in list won't be declined.

Even with that bug fixed, the framework doesn't seem to be declining the offers. My suspicion is that starting the sub-processs in statusUpdate is blocking any communication with Mesos. It is possible that the declineOffer could not be sent. The framework docs mention that Scheduler callbacks should not block.

My guess is that mesos-distcc needs to run the command either in the background and catch signal when it exits. Or run the scheduler and runner in parallel and use multiprocessing.Condition to signal that ready to run command.