Unable to connect to sandbox-created Unix domain socket when waiting for connection using epoll_ctl
gmwiz opened this issue · comments
Description
I'm trying to run MongoDB using gVisor in a way that MongoDB will listen on a Unix-domain socket and a connection to it will be made outside of the sandbox. This simple setup does not work on any version of MongoDB I tried so far, but works on pretty much any other application I've tried.
This reproduces easily on the latest gVisor version (20231218.0), and happens also when invoked under podman
or directly using runsc
with the default OCI spec emitted by runsc spec
with the addition of the following block under mounts
:
{
"destination": "/shared",
"type": "bind",
"source": "shared",
"options": [
"rbind"
]
}
One thing I've noticed that is specific to MongoDB is that the UDS is marked non-blocking, and that the socket is wrapped using the ASIO library. I was able to reproduce this failure when writing a C program that mimics the behavior demonstrated by MongoDB, as observed by strace (see following comment). Another thing to note is that connecting to the shared socket from within the sandbox works as expected.
Any help or guidance will be highly appreciated.
Steps to reproduce
sudo runsc install --runtime runsc-unix-debug -- \
--host-uds=all \
--debug \
--debug-log=/tmp/runsc-debug/ \
--strace \
--log-packets
docker run --runtime=runsc-unix-debug -v $(pwd)/shared:/shared --rm -it --entrypoint '' mongo:4.4.26 mongod --bind_ip=/shared/mongo.sock
sudo mongo "mongodb://shared%2fmongo.sock" --shell --verbose
Which will hang indefinetly:
connecting to: mongodb://shared%2Fmongo.sock/?compressors=disabled&gssapiServiceName=mongodb
D1 NETWORK [js] creating new connection to:shared/mongo.sock
D1 NETWORK [js] connected to server shared/mongo.sock
runsc version
runsc version release-20231218.0
spec: 1.1.0-rc.1
docker version (if using docker)
Client: Docker Engine - Community
Version: 24.0.7
API version: 1.43
Go version: go1.20.10
Git commit: afdd53b
Built: Thu Oct 26 09:08:02 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.7
API version: 1.43 (minimum version 1.12)
Go version: go1.20.10
Git commit: 311b9ff
Built: Thu Oct 26 09:08:02 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.26
GitCommit: 3dd1e886e55dd695541fdcd67420c2888645a495
runc:
Version: 1.1.10
GitCommit: v1.1.10-0-g18a0cb0
docker-init:
Version: 0.19.0
GitCommit: de40ad0
uname
6.6.8-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.8-1 (2023-12-22) x86_64 GNU/Linux
runsc debug logs (if available)
I was able to further narrow it down, and create a simple C reproducer that demonstrates the issue. It seems as if MongoDB Unix-domain sockets are marked non-blocking, and then epoll_wait
is used to wait for a new client (before calling accept
). It also seems like MongoDB uses epoll_ctl
to track the socket, and it does so immediately after creating the socket, and before calling bind
on the socket. When re-ordering the calls, moving the epoll_ctl
call after bind
, everything seems to work perfectly. I'm not entirely certain what component in the gVisor code base is responsible for this, but I'll try and take a look.
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/epoll.h>
int main(int argc, char *argv[]) {
int sock_fd = -1;
int flags = 0;
int ret = -1;
int client_fd = -1;
int epoll_fd = -1;
struct epoll_event epoll_read = {
.events = EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP|EPOLLET,
};
struct sockaddr_un addr = {0};
char *sock_path = NULL;
char *msg = NULL;
if (argc < 3) {
printf("<socket_path> <msg>\n");
return 1;
}
sock_path = argv[1];
msg = argv[2];
epoll_fd = epoll_create1(EPOLL_CLOEXEC);
if (epoll_fd < 0) {
perror("epoll_create");
return 1;
}
printf("socket\n");
sock_fd = socket(AF_UNIX, SOCK_STREAM|SOCK_NONBLOCK, 0);
if (sock_fd < 0) {
perror("socket");
return 1;
}
if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock_fd, &epoll_read) < 0) {
perror("epoll_ctl");
return 1;
}
printf("bind\n");
unlink(sock_path);
addr.sun_family = AF_UNIX;
strcpy(addr.sun_path, sock_path);
if (bind(sock_fd, &addr, sizeof(addr)) < 0) {
perror("bind");
return 1;
}
printf("listen\n");
if (listen(sock_fd, 128) < 0) {
perror("listen");
return 1;
}
do {
struct epoll_event ev[128] = {0};
ret = epoll_wait(epoll_fd, ev, 128, -1);
} while (ret < 0 && errno == EINTR);
if (ret <= 0) {
perror("epoll_wait");
return 1;
}
printf("accept\n");
client_fd = accept(sock_fd, NULL, NULL);
if (client_fd < 0) {
perror("accept");
return 1;
}
printf("send: %s\n", msg);
send(client_fd, msg, strlen(msg), 0);
printf("done!\n");
return 0;
}
Thanks for the bug report, and for narrowing it down with the reproducer. That was really helpful.
I think I have fixed the issue in #9849. Lets see if the e2e test concur.
Hi @ayushr2
Thank you so much for the super fast fix!!! That's amazing!
Do you happen to have an estimation of when will this patch be included in the next official release?
TIA
Hoping to have a release on Jan 3rd. cc @manninglucas
Ah, we released the Monday candidate (which doesn't contain the fix). You can expect us to make a release for 01/08 candidate next Wednesday.