Make (bridged) VM IP available in the controller API

Question

Make (bridged) VM IP available in the controller API

roblabla opened this issue a year ago · comments

It'd be nice for custom tooling to have the IP address of the VM available in the API. Currently, the only way to interact with the VM for tooling is through the ssh port-forwarding, which is clunky to use.

Nikolay Edigaryev · Answer 1 · Tue Sep 12 2023 03:12:18 GMT+0800 (China Standard Time)

Hello Robin,

I'm not yet sure if exposing the VM's IP in the API would be the right way to proceed because we cannot guarantee that this IP will be reachable to every client that can already access the API.

Do you mind elaborating a bit more on the clunkiness of the current SSH port-forwarding implementation, perhaps there is something we can improve there instead?

Robin Lambertz · Answer 2 · Tue Sep 12 2023 05:38:20 GMT+0800 (China Standard Time)

Sure! Sorry, that issue was a bit succinct and lacking context - it was made in a bit of a hurry 😓

At $WORK, we have a custom gitlab-runner based on kubevirt to run various VM-based workloads, and are looking to integrate a similar orchard-based runner for macos arm64 VMs. To this end, I'm writing an abstraction layer that allows interacting with kubevirt and orchard using a common python API.

Our runner is written in python, and the way custom runners work in gitlab, is that you give gitlab different scripts to run for the various stages in the lifecycle of a gitlab job. It usually follows three stages: prepare -> run (multiple times) -> cleanup. So in a simple case, we'll end up with our programs being launched at least three times:

orchard-runner prepare
orchard-runner run path/to/script/to/run.sh
orchard-runner cleanup

The bulk of the runner relies on spawning ssh to connect to the VM and running the script. Connecting via IP:port to connect (instead of going through orchard ssh) is desirable for multiple reasons:

We need to do some sftp file transfers, which is not possible via orchard ssh
We have some workloads that go through paramiko, an alternative ssh client written in pure python.

Because of this, I essentially have two choices:

Setup a portforward
Bridge the VM network and recover its IP

There are essentially two approaches possible to the portforward approach in our runner (in our experience with kubevirt):

Do it once at the prepare stage, and kill the port-forward process in the cleanup stage. This works, but it requires tracking the pid of the proxy so we can kill it in the cleanup stage.
Do it separately in each run. This has several disadvantages: If the python process crashes for whatever reason, it will leave behind a proxy process that it didn't cleanup. Over time, this can add up to a bunch of processes being left behind.

Each come with their own sets of upsides and downsides, but the gist is that it's a lot more work and things that can go wrong. Furthermore, while I don't know if orchard suffers from this, I had some interesting behaviors with long-running ssh connection going through the kubernetes port-forward proxy, where the connection would occasionally timeout.

Doing a direct connection to a bridged network involves less moving parts, and results in a more reliable system that's easier to debug when it goes wrong.

we cannot guarantee that this IP will be reachable to every client that can already access the API.

In my network, I know that if I bridge the address, it will be reachable to all the client that uses the API. While I understand that this isn't necessary the case for everyone, I still think it'd be helpful to return the IP. Kubevirt returns the IP, and they're in a similar situation where the IP may not be publicly reachable. I think it should be up to the network operator to set things up if they want the IP returned by the orchard cluster to be reachable.

FWIW I started work on doing this since I need it in the short-term (see this branch).

Robin Lambertz · Answer 3 · Mon Sep 18 2023 17:29:53 GMT+0800 (China Standard Time)

Been using this patch in my poc cluster and it works nicely. Only downside is that if the IP of the VM ever changes, orchard will keep reporting the old one, it has no mechanism to detect a new IP. That's fine for my use-case (I don't expect my VMs to ever change IP), but may not work for other people. I don't think it's easily fixable though - AFAICT, the only way to reliably get notified when the IP changes is to have an agent running in the VM.

eecsmap · Answer 4 · Wed Jul 03 2024 00:56:19 GMT+0800 (China Standard Time)

This feature is still good to have. I put some details in #176 (comment)

Nikolay Edigaryev · Answer 5 · Thu Jul 04 2024 03:09:25 GMT+0800 (China Standard Time)

Please check out the new 0.22.0 release, it exposes a new GET /v1/<VM name>/ip endpoint that will resolve the actual VM's IP on the worker.

Both the controller and workers need to be updated for this to work.

eecsmap · Answer 6 · Tue Jul 09 2024 04:57:37 GMT+0800 (China Standard Time)

Verified, It works as expected! Thanks!