maximecb / uvm

Fun, portable, minimalistic virtual machine.

Home Page:http://uvmplatform.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

We need an API to read/write to files/streams

maximecb opened this issue · comments

UVM needs an API to read/write to files/streams. Ideally the API should be simple and easy to use. It should ideally also be usable for standard input/output, files and network sockets if possible. I'm opening this issue to solicit feedback.

It might make sense to have both a synchronous and asynchronous API with callbacks. UVM has no threads so, it makes more sense for it to process network traffic asynchronously rather than polling. However, when it comes to reading files, it might be fine to read all the data synchronously as this can be very fast and simpler to work with.

We might need different syscalls for opening sockets vs opening files, but the functions for knowing how much data is available to read and to read the actual data could be the same.

Input/feedback/help welcome.

We might need different syscalls for opening sockets vs opening files, but the functions for knowing how much data is available to read and to read the actual data could be the same.

Yes this sounds like right. Also imo file ops should also be async given that speed of compute >>>> disk I/O. Will help in efficient utilisation of resources.

I'd like to have both blocking and async operations for file I/O if possible, just because synchronous is easier to deal with for some things, but for networking, async seems really necessary.

commented

Hello @maximecb, hello @hd-COO7,

I think the most general file/stream I/O functions in the POSIX standards e.g. POSIX-1.2017 will probably be the read (fd, buf, nbyte) and write (fd, buf, nbyte) functions, and maybe close (fd). On POSIX systems, read, write, and close are equally applicable to regular files, device files, pipes, network sockets, etc.

So if you ask me, a UVM interface that has some notion of "open resource handle" — similar to, or even identical, to POSIX file descriptors — and has read, write, and close syscalls, is probably not a bad place to start. 🙂

For regular files — and perhaps file-like things such as disk partitions (Linux's /dev/sda etc.) and video frame buffers (/dev/fb0, /dev/vcsa0, etc.) — you probably need a way to adjust the read/write pointer (a.k.a. file offset) into an open file. This probably means UVM should have some sort of lseek syscall.

(Re asynchronous I/O: I do not think current OSes allow user programs to ask "how much data is available to read" on a particular handle — except for file-like things. But they do have a way to query whether there is any data to read.)

Thank you!

My current thinking is that it's maybe not necessary or that useful to try to have a stream API that covers both files and network sockets. It might be an abstraction that breaks in a few places, or on some platforms.

For file I/O, for the most part, we'll probably want pretty simple synchronous I/O, and that's going to be similar to the POSIX APIs, with file handles like you said.

For network I/O, I think we probably want mostly (or even only) async, and that's going to need its own kind of system with callbacks. I'd like to simplify things as much as possible. One syscall to open a listening TCP socket and register a callback for new connections. Then another syscall to register a callback to get incoming data from the new socket, etc.

Here is my current sketch for a simple async networking API:

// Create a TCP listening socket to accept incoming connections
u64 socket_id = net_listen_tcp(
  u16 port_no,
  ip_space, // IPV4 / IPV6
  const char* net_iface, // Network interface address?
  callback, // Called on new incoming connection
  // Should this take any other parameters/flags?
)
// To accept a new connection, you define a callback of the form:
void new_connection(u64 socket_id, client_addr)

The new connection callback should potentially return a boolean so that we can accept or reject the connection?

To read and write data, we have functions such as:

void net_read(u64 socket_id, void* buffer, u64 buf_len, callback);
void net_write(u64 socket_id, void* buffer, u64 buf_len, callback);

The POSIX network read function blocks until data is available, so it definitely needs a callback for when data becomes available. One awkward thing here is that we probably have to preallocate a buffer before net_read is called, which will be reused when calling into the callback. Since UVM doesn't have threads, this should be OK, these can essentially be global variables. This seems a bit unusual to me, but in the absence of threads, it should be safe. It definitely seems better than having the callback allocate memory which we then have to remember to free.

The POSIX function to send data can block if the send buffer is full. Unsure if we want to have that complexity. I think we could probably actually make net_write be synchronous since we can maintain our own buffer?

And lastly a function to close open sockets:

net_close(socket_id)
commented

Hello @maximecb,

It might be an abstraction that breaks in a few places, or on some platforms.

I believe Windows will be one of these (it has been some time since I last programmed for Windows though).

The POSIX network read function blocks until data is available, so it definitely needs a callback for when data becomes available. One awkward thing here is that we probably have to preallocate a buffer before net_read is called, which will be reused when calling into the callback.

A callback-oriented interface does not look too awkward either (IMO). Instead of having a single global buffer, you could arrange for the new_connection callback to create and maintain a separate dedicated buffer for each incoming network connection.

Alternatively it is possible to make a network socket non-blocking (using something like

  • fcntl (fd, O_SETFD, ... | O_NONBLOCK)
  • or possibly ioctl (fd, FIONBIO, ...)

on POSIX systems, or

  • ioctlsocket (fd, FIONBIO, ...)

on Windows systems with Winsock). I believe Curl uses non-blocking sockets together with poll (or select) to implement some sort of network event loop. So I suppose another possibility is for UVM to include syscalls to support such an event-loop-driven model of coding.

Thank you!

Instead of having a single global buffer, you could arrange for the new_connection callback to create and maintain a separate dedicated buffer for each incoming network connection.

The thing is, the host VM can't really allocate memory on behalf of the running program.

Closing in favor of #28

We can come back to the issue of a file I/O API later.