sensors IPC API should use reply-fault.

Question

sensors IPC API should use reply-fault.

cbiffle opened this issue 4 months ago · comments

Currently the operations in the sensors IPC API can generally fail for two reasons not related to the sensor device itself:

If an invalid sensor ID is passed in.
If there is no data.

(1) is not expected to happen in correct programs except in the case where a tool allows a user to enter an arbitrary sensor ID. (We want to support that, and also gracefully handle cases where e.g. MGS queries the fleet for a sensor that's maybe only implemented on a subset.) So I'd argue this could use reply-fault at the IPC interface, as long as we provide callers a way to ensure they are meeting the "valid SensorId" precondition.

(2) is absolutely expected to happen in correct programs but, I would argue, does not belong in an error type, and should probably use Option (the normal way of indicating missing data).

Here's an alternative proposed IPC API, written as Rust functions because Idol syntax is woooordy.

// SensorId now only represents _valid_ sensor IDs.
// Use SensorId::try_from if you're worried yours might be invalid.
// (Firmware in general can't really produce invalid IDs.)

impl SensorService {
    fn get(id: SensorId) -> Option<f32>;  // where None = "no data"
    fn get_reading(id: SensorId) -> Option<Reading>;

    // ...and so forth
}

A client constructing a SensorId from an arbitrary integer would use try_from to check it. In the case of MGS, this would happen in control-plane-agent, most likely.

A client passing an invalid SensorId into the IPC interface would be violating a precondition and would receive reply-fault.

This change looks like it could remove a lot of code in various places (much of which is wrapped up in reusable functions, but is inlined in flash so it still counts). There are a couple of operations in the interface that would require careful thought, like get_raw_reading, which is capable of returning both NoData and also (NoData, timestamp), presumably to indicate "sensor has never checked in" vs "sensor has checked in and has no reading".

Eliza Weisman · Answer 1 · Tue Mar 26 2024 04:10:52 GMT+0800 (China Standard Time)

I'm happy to take a pass at this if you're not planning to right away?

Eliza Weisman · Answer 2 · Wed Mar 27 2024 00:23:11 GMT+0800 (China Standard Time)

Hmm, so...one wrinkle with doing this is that the validation of SensorIds is based on the length of the array of sensors in the sensor task, which is determined based on the task's config. This length is known in the task-sensor-api crate (which includes the config), but the SensorId type is defined in task-sensor-types, which doesn't know the length of the array. I'll see if we can just move the type...