Libprocess woes from behind NAT (docker BRIDGE networking)
anthonyrisinger opened this issue · comments
I'm not sure if this is a bug/feature-request in pesos/compactor, but since it affects how pesos can be deployed, and since docker is a standard Mesos executor, I'm documenting here.
Our infrastructure is primarily composed of running docker images in BRIDGE mode across our Mesos cluster. BRIDGE is preferred to HOST networking for a number of reasons, including security and flexibility (otherwise many services would be limited to one instance per slave). Our custom framework is no different; it's ran as a normal Marathon+docker task, which in turn starts it's own tasks against the same Mesos cluster.
The problem is how libprocess/compactor works: it requires BOTH nodes to host an HTTP server, and advertises it's own return address as a Libprocess-From: name@ip:port
header. However, since the framework is running within a BRIDGE container, the ip:port
address advertised is INTERNAL and NON-ROUTABLE from the outside.
The solution is to allow compactor.context.Context
to initially bind to the internal ip:port
, then change it's properties to advertise the public side of the port forwarding setup by docker, thus making it self-referential. Mesos (or Marathon or docker executor?) exports the forwards as PORT*
environment variables, and one can use this to learn the public facing ip:port
:
context = None
if 'PORT_8055' in os.environ:
# Construct/hack our own context to account for docker BRIDGE
# networking (NAT). The socket is bound to 0.0.0.0:8055 but we
# advertise something externally routable.
context = compactor.context.Context(port=8055)
context.start()
context.port = int(os.environ['PORT_8055'])
if 'HOST' in os.environ:
# context.ip = socket.gethostbyname(os.environ['HOST'])
context.ip = socket.getaddrinfo(
os.environ['HOST'], 8055, socket.AF_INET,
socket.SOCK_STREAM, socket.IPPROTO_TCP
)[0][4][0]
# pesos will use our context instead of Context.singleton()
# driver = pesos.scheduler.SchedulerDriver(..., context=context)
Maybe this is a use case pesos/compactor would like to support directly? Or maybe Libprocess already has a solution, and compactor needs an update? The above "works" but feels like a hack.
I'd like a clean way to say "bind to this ip:port
, but advertise this other ip:port
".
If I understand what you're saying correctly... TL;DR this is how LibProcess works.
You can configure the IP address that's bound to by using the LIBPROCESS_IP
environment variable, however with bridge networking if you specify that as the host interface you're not going to be able to bind to that, as you mentioned.
There's a couple of tickets you might want to keep an eye on;
- https://issues.apache.org/jira/browse/MESOS-809
- https://issues.apache.org/jira/browse/MESOS-2708
- There was a ticket about running mesos master/slave processes inside containers which had a similar issue to what you're describing with BRIDGE mode but I can't seem to find it