oxidecomputer / omicron

Omicron: Oxide control plane

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide a mechanism for changing individual port settings

rcgoodfellow opened this issue · comments

The problem described in #4405 is not entirely solved by group settings composition. In many cases operators will just want to change individual properties of a port's settings. Having to pull an entire port settings object, make a small change like adding or modifying an IP address, and then pushing the whole thing back again is ridiculous.

Using JSON patch seems like a reasonable way to accomplish this. At the end of the day, we want operators to be able have commands like

oxide system networking addr add 198.51.100.1/30 --port qsfp19
oxide system networking route del 10.0.0.0/16 nexthop 10.0.10.1

We have also discussed changing the firewall rules API for the same reason. IAM policy updates also work this way. In both cases, in the web console we fudge it (firewall, policy) so that for the end user it works more intuitively, like you describe.

Now that we have a bunch of endpoints like this, I think it is a lot easier to look at it holistically and see what it would look like to use JSON patch or similar.

FWIW, I see this as a false dichotomy. It is possible to create that CLI experience, but we don't because of our direction there as well. While I get why we claim this is ridiculous, there's also the problem of concurrent modification. I understand that what we've done makes the easy path harder than it should be, but please know it wasn't just because we hate people that we were trying to go down this direction.

I do think it's worth mentioning the possibility of adding this kind of functionality at the CLI level like we've done in the console. We worry about maintenance burden in the CLI, but a) it hasn't been a problem on web (though we have much more person-hours focused on web), and b) that has to be weighed against the cost of making this change in the API. On the other hand, when we only build this into some clients, other clients do not get the benefit.

To be clear, my complaint about this being ridiculous is about the experience of operating the rack, and not a critique on API architecture. I think implementing this client side is fine. If the CLI is to remain strictly generated from the API, that is also fine. We can create an interactive network management shell similar to what most network operating systems have. That shell can leverage the fact that we have an API that guards against the dueling administrators problem and provides commands for staging incremental changes and then committing them (or rolling them back), where commits may fail due to collisions. This is similar to what Junos, VyOS and Cumulus NCLU do.

I've reversed my position a bit from oxidecomputer/oxide.rs#516 after some more consideration on this topic.

Goals of the CLI include minimizing the manual work required when updating the API and reducing opportunities for incompatibilities / failures due to subtle mismatches between the CLI interface and the API. We certainly can (and do!) provide hand-written subcommands as well as subcommands with some manual intervention (i.e. those for which the generation isn't particularly ergonomic--at least at present).

In this case (and, I think, others) the goal is to make it hard for users to inadvertently modify the system in a way that creates an invalid or unsafe state while racing with another user. In general, we would like to present a user with the state of the system to they can make their change in context. In the console, for example, I'd expect too see the state, make a change, and get notified if things changed between when I fetched the state and applied my change (etags). If my change was non-conflicting I might see "hey, things changed, you still want to do this?". If it was conflicting, it's a merge conflict that requires manual intervention.

So for us, for this case I think I want the console doing the read, modify, write... and it's going to be even better when we have etags so that dueling administrators don't overwrite each other. But at least they'd overwrite each other in a way that's consistent and (if they've reviewed the data on screen properly) unlikely to be wrong or dangerous. More generally, it feels like the responsibility of the API client to get the human's eye's on the state. And it might be reasonable for the CLI to say basically "I hope you know what you're doing" because there's no simple way to validate what the user is doing.

In short: I think we should do this in the CLI rather than making the less-safe operation first class in the API