thecodeteam / mesos-module-dvdi

Mesos Docker Volume Driver Isolator module

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

isolator should invoke potentially blocking operations async from module API handlers

jdef opened this issue · comments

commented

related to #88, if calls to os::shell to execute dvdcli hang or block for significant amounts of time then the task launch pipeline breaks down and tasks become stuck in STAGING. part of the reason why this happens is because the isolator module invokes potentially blocking operations synchronously from within the mesos module API handlers.

a better approach would be to invoke such commands asynchronously. perhaps by using, for example, Subprocess. HDFS code in Mesos provides an example of this approach: https://github.com/apache/mesos/blob/4d2b1b793e07a9c90b984ca330a3d7bc9e1404cc/src/hdfs/hdfs.cpp#L53

I looked at the Marathon code and I agree that this is a good idea and should be feasible. Thanks for the input.

To add to @jdef 's description, this problem is pretty severe. If any operation in dvdi module blocks, ALL subsequent container launch/update/destroy will be BLOCKED, irrespective of whether the container is using external volume or not.

Fixing that might involve serializing dvdcli operations. This is because when you use Subprocess, the order in which dvdcli operations are executed is non-deterministic. For instance, say you have a volume you want to umount first and then a new container coming requesting the same volume. You expect that the volume will be mounted for the new container. However, due to the race, it's likely that the umount happens later than the mount.

@jdef, Just for my understanding, what happens in the case for docker type workloads/containers? The specific case I am thinking about is if we mount the volume async, come out of staging state, and the application comes up without the volume data being available, the application might error out from the data not being there.

Maybe I am misunderstanding how to use subprocess and what its capabilities are.

commented

@jieyu how did we handle this scenario with the docker volume isolator
recently added to mesos?

On Wed, May 18, 2016 at 11:21 AM, David vonThenen notifications@github.com
wrote:

@jdef https://github.com/jdef, Just for my understanding, what happens
in the case for docker type workloads/containers? The specific case I am
thinking about is if we mount the volume async, come out of staging state,
and the application comes up without the volume data being available, the
application might error out from the data not being there.

Maybe I am misunderstanding how to use subprocess and what its
capabilities are.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#92 (comment)

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)