Shared process state management
trungnt2910 opened this issue · comments
Background
During the last few months blink
has been greatly expanded to serve more use cases than its original purpose of distributing portable apps. From features such as dynamic object support and support for a virtual root filesystem, blink
is becoming more and more suitable as a portable Linux compatibility layer, or in other words, a replacement for the abandoned flinux
project or somewhat a substitute for the dying WSL1⁽¹⁾.
To serve this use case, more syscalls and some other features (for example, a complete procfs
) need to be implemented. Many of these:
- Are Linux-specific features and without counterparts on popular platforms.
- Require storing, managing, and retrieving cross-process data/states.
Therefore, I propose to add a "kernel server daemon" mode to blink
. This is similar to how other similar popular projects solve this problem, such as Wine's wineserver
or DarlingHQ's darlingserver
Scenarios
eventfd
support (#76)
eventfd
s can be implemented as an anonymous device on the VFS layer, with all calls forwarded to the daemon process:
int VfsEventfd(unsigned int initval, int flags) {
struct VfsInfo *output;
int ret;
if (EventfdCreate(initval, flags, &output) == -1) {
return -1;
}
ret = VfsAddFd(output);
if (ret != -1) {
unassert(!VfsFreeInfo(output));
}
return ret;
}
int EventfdCreate(unsigned int initval, int flag, struct VfsInfo **output) {
int servercookie;
servercookie = ServerCall(EVENTFD_CREATE, initval, flag);
if (servercookie < 0) {
return -servercookie;
}
if (VfsCreateInfo(output) == -1) {
servercookie = ServerCall(EVENTFD_CLOSE, servercookie);
return -1;
}
*output->data = (void *)(intptr_t)servercookie;
return 0;
}
int EventfdRead(struct VfsInfo *info, void *buf, size_t nbyte) {
return ServerCall(EVENTFD_READ, info->data, buf, nbyte);
}
The daemon process would manage the eventfd object the way Linux manages it.
ptrace
support (#56)
If/When ptrace
is implemented using the "cooperative debugging" mentioned in the related issue, it can use the daemon process as a means of communication instead of having to open a temporary UNIX socket.
procfs
support
While #88 brought some initial support for procfs
, it is nowhere near enough for some common UNIX tools like ps
to function, because the only information this procfs
implementation gives is about the current process.
A daemon process can store all required information for implementing procfs
.
For example, some functions could be re-written:
// https://github.com/jart/blink/blob/0bdacfedaeb77a3c122bbd80c9f12394f17da772/blink/procfs.c#L980
int ProcfsRegisterExe(i32 pid, const char *path) {
// The server should know the process's mount namespace
// and should be able to traverse and resolve `path`.
unassert(!ServerCall(EXE_REGISTER, pid, path));
}
// https://github.com/jart/blink/blob/0bdacfedaeb77a3c122bbd80c9f12394f17da772/blink/procfs.c#L1114
static ssize_t ProcfsPiddirExeReadlink(struct VfsInfo *info, char **buf) {
ssize_t ret, len;
len = PATH_MAX;
*buf = malloc(len);
ret = ServerCall(EXE_GET, pid, buf, len);
// reallocate buf until it is large enough
}
Running init
init
would complain on blink
for not being on PID 1.
This can be solved by letting the daemon process manage all blink
PIDs, effectively putting all blink
processes under a new emulated PID namespace.
The daemon could also emulate wait
calls for the emulated PID 1 to make sure init
manages blink
processes whose parent has died/exited.
Requirements
Goals
The daemon should:
- Correctly manage process information and cross-process objects.
- Properly clean up related data when a
blink
process exits. - Not consume an unreasonably high amount of the host's resources.
- Not have too much overhead for commonly used syscalls.
- Not cost anything for those who don't need it.
Non-goals
The daemon should not/does not need to:
- Manage processes outside of
blink
. This includes native processesexecve
d from ablink
process. - Manage the process state of multiple different direct
blink
invocations. This means that each time the user runsblink
from the host shell, a different daemon process is created.
Design
Build
As this is a costly feature and is not required solely for the original purpose of "distributing portable apps", this, similar to the VFS feature, should be disabled by default.
A flag DISABLE_DAEMON
should be created. To enable this feature, builders of blink
should pass an additional argument to ./configure
:
./configure --enable-daemon
The --disable-daemon
flag should also exist to negate a previously passed --enable-daemon
flag.
Runtime
When a special flag is passed to blink
, for example, -d
, instead of directly executing the required binary, blink
should fork itself.
The parent process should enter a daemon mode: It sets up necessary subsystems and open a UNIX socket with a fixed name located at the root of $BLINK_PREFIX
.
The child process should wait for the parent to complete its setup and continue with normal blink
operations.
Every time a descendant process starts, before emulating, it should attempt to connect to the UNIX socket opened by the daemon right after initialization of the VFS subsystem.
When this special flag is not passed to blink
, processes are created normally and features that require this daemon are disabled.
Process lifetime
- When a process/thread is initialized successfully, it should send a message to the daemon. The daemon then allocates a PID/TID as well as some other necessary resources.
- When a process/thread exits normally, it should also send a message to the daemon with the status code. The deamon should then close the file descriptor associated with the thread and clean all related process information if there are no more threads for that PID.
- When a process
execve
s into a native process, it should send a message to the daemon. The daemon should clean all resources as if the process exited.
Server calls
- A server call should be one
write
call to the socket followed by aread
call for a 64-bit result. - If the
read
call fails withEINTR
, a message should be sent to the daemon that the process wants to interrupt this call. After sending this message, the process should block all signals andread
until it gets a reply from the daemon.
Q & A
Why bother creating a separate process? Isn't shared memory enough?
Some problems can be solved with shared memory. However, if a process dies unexpectedly (killed by Task Manager or through kill -9
on the host), there's no way to know it died and clean up the shared memory.
Furthermore, as many features require this shared memory, a good memory allocator is required (it is not optimal to map a few pages per feature per process). The amount of shared memory would also be limited as the shared pages has to be mapped by the first init
process.
Why bother creating this feature in the first place, no one will run it.
That is true for everyone who's sticks to blink
's original purposes.
Similarly, nobody would use --enable-vfs
except for someone who needs his Alpine root to run out of the box, and someone who needs /proc/self/exe
to work properly.
Similarly, people using blink
as a compatibility layer and userspace emulator don't care much about the system emulation and 16-bit features, as QEMU and DOSBox already does the job well. But there seems to be a thriving community around this use.
Isn't this going too far? With PID management aren't blink
processes too isolated from the host now? At this point, isn't it better to just run a Linux image?
No. A lot of things are still integrated, such as viewing (and killing) blink
processes from the host's Task Manager, sharing files and UNIX sockets between the host and emulated processes, using the same networks as the host,... all of which cannot be achieved by solutions that run a whole Linux image like WSL2.
Conclusion
This proposal was quickly written in just more than an hour so it may have unclear points and/or mistakes of some type. Or my idea is simply non-optimal and there might be better ways to share process state or implement these Linux features without even sharing process state.
If you support this proposal, please give it a 👍 so I can see that it's worth allocating my time for.
If you have any suggestion, please let me know through this issue or ping me on the redbean Discord for a quicker reply.
⁽¹⁾: For any potential pedantic readers, no, blink
as an emulator can never replace WSL1, as WSL1 uses NT kernel magic to natively execute instructions while blink
has to use a JIT. However, this doesn't mean that blink
cannot catch up with WSL1 in terms of userland emulation.
@trungnt2910 : sounds like such a feature might be useful. I guess one challenge will be to make it small and lightweight.
Thank you!