wickman / pesos

pesos is a pure python implementation of the mesos framework api

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Libprocess woes from behind NAT (docker BRIDGE networking)

anthonyrisinger opened this issue · comments

I'm not sure if this is a bug/feature-request in pesos/compactor, but since it affects how pesos can be deployed, and since docker is a standard Mesos executor, I'm documenting here.

Our infrastructure is primarily composed of running docker images in BRIDGE mode across our Mesos cluster. BRIDGE is preferred to HOST networking for a number of reasons, including security and flexibility (otherwise many services would be limited to one instance per slave). Our custom framework is no different; it's ran as a normal Marathon+docker task, which in turn starts it's own tasks against the same Mesos cluster.

The problem is how libprocess/compactor works: it requires BOTH nodes to host an HTTP server, and advertises it's own return address as a Libprocess-From: name@ip:port header. However, since the framework is running within a BRIDGE container, the ip:port address advertised is INTERNAL and NON-ROUTABLE from the outside.

The solution is to allow compactor.context.Context to initially bind to the internal ip:port, then change it's properties to advertise the public side of the port forwarding setup by docker, thus making it self-referential. Mesos (or Marathon or docker executor?) exports the forwards as PORT* environment variables, and one can use this to learn the public facing ip:port:

        context = None
        if 'PORT_8055' in os.environ:
            # Construct/hack our own context to account for docker BRIDGE
            # networking (NAT). The socket is bound to but we
            # advertise something externally routable.
            context = compactor.context.Context(port=8055)
            context.port = int(os.environ['PORT_8055'])
            if 'HOST' in os.environ:
                # context.ip = socket.gethostbyname(os.environ['HOST'])
                context.ip = socket.getaddrinfo(
                        os.environ['HOST'], 8055, socket.AF_INET,
                        socket.SOCK_STREAM, socket.IPPROTO_TCP

        # pesos will use our context instead of Context.singleton()
        # driver = pesos.scheduler.SchedulerDriver(..., context=context)

Maybe this is a use case pesos/compactor would like to support directly? Or maybe Libprocess already has a solution, and compactor needs an update? The above "works" but feels like a hack.

I'd like a clean way to say "bind to this ip:port, but advertise this other ip:port".

If I understand what you're saying correctly... TL;DR this is how LibProcess works.

You can configure the IP address that's bound to by using the LIBPROCESS_IP environment variable, however with bridge networking if you specify that as the host interface you're not going to be able to bind to that, as you mentioned.

There's a couple of tickets you might want to keep an eye on;