wickman / pesos

pesos is a pure python implementation of the mesos framework api

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PesosSchedulerDriver.join() prevents signal handling

anthonyrisinger opened this issue · comments

With the current impl, it's not possible to handle signals like SIGINT (KeyboardInterrupt) because join() calls self.lock.wait() with no timeout -- the signal is caught and queued for handling but the main thread is never given a chance to respond (waiting for compactor thread to notify()).

Adding a timeout periodically gives the main thread a chance to respond.

Workaround:

class SchedulerDriver(scheduler.PesosSchedulerDriver):

    @scheduler.PesosSchedulerDriver.locked.__func__
    def join(self):
        if self.status is not mesos_pb2.DRIVER_RUNNING:
            return self.status

        while self.status is mesos_pb2.DRIVER_RUNNING:
            self.lock.wait(1)

        scheduler.log.info(
            "Scheduler driver finished with status %d",
            self.status,
            )
        assert self.status in (
            mesos_pb2.DRIVER_ABORTED,
            mesos_pb2.DRIVER_STOPPED,
            )
        return self.status

Good point, we should get that fixed. Taken from a framework we're using pesos with.. we're not using the join() method.

# Kick off the pesos scheduler and watch the magic happen
thread = threading.Thread(target=driver.run)
thread.setDaemon(True)
thread.start()

# Wait here until the tasks are done
while thread.isAlive():
    time.sleep(0.5)

run() calls join() internally, but what you have also allows SIGINT because you are in the main thread, and you passed a "timeout" to sleep() -- the interpreter will get a chance to raise KeyboardError.

Your pesos thread will not shutdown cleanly though (marked daemon and never called stop())... not necessarily a problem, but something to keep in mind if you perform any state/syncing activities.

Your pesos thread will not shutdown cleanly though (marked daemon and never called stop())... not necessarily a problem, but something to keep in mind if you perform any state/syncing activities.

Good shout, i'll double check this. Thanks!