[RFC] Reorganize libtock-rs for testability

Question

[RFC] Reorganize libtock-rs for testability

jrvanwhy opened this issue 5 years ago · comments

libtock-rs currently has very limited testability. The system calls are globals and are disabled on non-embedded platforms. As a result, its tests are unable to make system calls, which limits what can be tested.

Instead, I propose that we split libtock-rs into 3 (or more) crates:

libtock-portable: This will contain a trait representing Tock's system calls, as well as the drivers. The drivers will be generic over the syscall trait implementation, and as such will be architecture-independent. This crate should be #[no_std] but platform-independent.
libtock-new: This will depend on libtock-portable, and will implement the Rust runtime on Tock as well as the "real" Tock system call trait implementation.
libtock-hosttest: This will contain an implementation of Tock's system call trait that runs on Linux and can depend on std, as well as any other testing functionality we find convenient. It will be a dev dependency of libtock-portable, and as such will be used by libtock-portable's unit tests. Its system call trait will be a sufficient approximation of Tock's kernel to allow libtock-portable to be thoroughly tested.

For each major component of libtock-new or libtock-portable (e.g. each driver or major subsystem), there should be at least one example in libtock-portable's examples/ directory that evaluates that component's size (code size and .data size). This will require building code size analysis tooling.

Note that this issue represents a large refactoring, and I don't expect to get it right on the first try. I intend to keep the existing libtock crate in place until libtock-new is ready to replace libtock. We can then replace libtock with libtock-new.

Alistair Francis · Answer 1 · Thu Jan 16 2020 16:05:17 GMT+0800 (China Standard Time)

Can we make sure that we don't end up with a single PR that changes everything? It would be great to get this in a small step at a time.

torfmaster · Answer 2 · Thu Jan 16 2020 23:59:54 GMT+0800 (China Standard Time)

As far as I followed the discussion (which was in the beginning independent of testability but more about making it easier to audit the library) I think two targets are followed here:

restructure libtock-rs to improve the architecture (and as a result get better tests)
test methods which do syscalls

I think an RFC should consist of the desired result and not of the solution because otherwise we narrow down the discussion to one solution and don't take other solutions for the same problem into account. I think we should create two RFCs "improve the architecture" and "make syscalls testable" and write down the desired result.
For the latter I see other solutions we should consider than splitting the crate into parts just for the sake of testability: One solution (proposed by @Woyten as far as I remember) is to record syscalls in thread-locals (like std::thread::LocalKey) to make assertions on them when running tests. One (very advanced, just to get an inspiration) way similar things are tested outside the Rust world are rxjs-marble tests.

Johnathan Van Why · Answer 3 · Fri Jan 17 2020 01:20:43 GMT+0800 (China Standard Time)

Can we make sure that we don't end up with a single PR that changes everything? It would be great to get this in a small step at a time.

The plan I had was to build the new library piece-by-piece to enable quality code review. However, I was planning to remove the current libtock crate and replace it with the new libtock in a single PR.

Johnathan Van Why · Answer 4 · Fri Jan 17 2020 01:24:06 GMT+0800 (China Standard Time)

I think an RFC should consist of the desired result and not of the solution because otherwise we narrow down the discussion to one solution and don't take other solutions for the same problem into account. I think we should create two RFCs "improve the architecture" and "make syscalls testable" and write down the desired result.

This particular RFC is about splitting libtock-rs up for improved testability. Design changes to improve code size are an independent issue; that's currently waiting on tock/design-explorations#2.

For the latter I see other solutions we should consider than splitting the crate into parts just for the sake of testability: One solution (proposed by @Woyten as far as I remember) is to record syscalls in thread-locals (like std::thread::LocalKey) to make assertions on them when running tests. One (very advanced, just to get an inspiration) way similar things are tested outside the Rust world are rxjs-marble tests.

I thought of that as well, but it does not seem to be nearly as nice of a solution as abstracting syscalls behind a trait. I'd rather have separate crates than a single crate full of #[cfg(...)] statements.

torfmaster · Answer 5 · Fri Jan 17 2020 02:01:49 GMT+0800 (China Standard Time)

I thought of that as well, but it does not seem to be nearly as nice of a solution as abstracting syscalls behind a trait. I'd rather have separate crates than a single crate full of #[cfg(...)] statements.

I would also consider to use a separate module for instead of a crate and just put all platform dependent code there. The amount of cfg statements then would be rather small (all conjunctions of desired test/arch combinations). One disadvantage of having different crates is that you have to maintain the versioning.

torfmaster · Answer 6 · Fri Jan 17 2020 02:15:21 GMT+0800 (China Standard Time)

I have a bad feeling about deciding up front how to refactor the library to get better tests. My personal preference would be to write tests at all (we currently have little to no tests) and integrate them into the CI pipeline. This also helps us to evaluate which tests are actually helpful which is hard to anticipate (at least according to my experience) especially in the embedded world where I have the impression most applications don't have tests at all.

When we face obstacles then we can remove them. This incremental approach has the advantage that we only spend our time where for things which actually help us.

Also it kind of breaks the TDD cycle (which is of course not the only possible approach) write a test, make it green, refactor because it starts with refactorings and ends with a probably manual verification.

Maybe we should create an issue: "write unit tests which actually test syscalls".

Don't get me wrong I'm not challenging the conclusion but the way we get there.

Johnathan Van Why · Answer 7 · Fri Jan 17 2020 03:07:59 GMT+0800 (China Standard Time)

I would also consider to use a separate module for instead of a crate and just put all platform dependent code there. The amount of cfg statements then would be rather small (all conjunctions of desired test/arch combinations).

The amount of #[cfg] statements would grow over time. Note this discussion from a tock/tock PR: tock/tock#1393 (review).

I'm trying to avoid using #[cfg] to switch between real and fake implementations of syscalls. I don't think #[cfg] is the right way to switch between real and fake implementations of APIs; I think dependency injection through generic args is. As such, I believe they belong in different crates.

One disadvantage of having different crates is that you have to maintain the versioning.

The version number can be kept in sync.

torfmaster · Answer 8 · Fri Jan 17 2020 03:18:30 GMT+0800 (China Standard Time)

The amount of #[cfg] statements would grow over time. Note this discussion from a tock/tock PR: tock/tock#1393 (review).

If we mock a complete platform module (as we do now) the number would stay fixed (and I think it's feasible). In contrary to a kernel (which has a lot of architecture specific code) we only have syscalls and I think the number is not beeing expected to grow.

The version number can be kept in sync.

How do you push these dependent crates to crates.io? If you have >3 crates in one repository (which I expect to be planned) then you will need to build scripts for publishing the crates (otherwise it will be a lot of unpleasant manual workj) whereas publishing a single crate is easy. The alternative would be splitting the repository which would slow down development considerably. Moreover, publishing really working artifacts to crates.io will be difficult (and it seems to be our goal, still, to do this). We would have to download them and check them to be sure we didn't push crates which don't work together.

Johnathan Van Why · Answer 9 · Fri Jan 17 2020 05:40:49 GMT+0800 (China Standard Time)

The version number can be kept in sync.

How do you push these dependent crates to crates.io? If you have >3 crates in one repository (which I expect to be planned) then you will need to build scripts for publishing the crates (otherwise it will be a lot of unpleasant manual workj) whereas publishing a single crate is easy. The alternative would be splitting the repository which would slow down development considerably. Moreover, publishing really working artifacts to crates.io will be difficult (and it seems to be our goal, still, to do this). We would have to download them and check them to be sure we didn't push crates which don't work together.

I'm not seeing the issues here. If we keep the version numbers in sync between the crates, and push them at the same time, then the three crates pushed to crates.io will be usable together.

torfmaster · Answer 10 · Sat Jan 18 2020 20:35:37 GMT+0800 (China Standard Time)

I'm not seeing the issues here. If we keep the version numbers in sync between the crates, and push them at the same time, then the three crates pushed to crates.io will be usable together.

I see some work which has to be done. It's of course doable but you have to take this into account

there must be a way to release each crate individually (to create patch releases). Otherwise every breaking change somewhere in some crate in the repository causes a major version change in one of the crates which leads to different major versions with identical code (which contradicts the way semantical versioning should work). If we don't release semantically then our users will miss important security patches because they think that major version updates are effort (and probably even not necessary now).
releasing the library has to be automated because with several crates in the repository releasing crates gets error prone. Moreover, I think to be strict we should release the crates in the right order and doing this manually will get a pain, otherwise we will release libraries with dependencies which cannot be resolved

I think when we split our library into several crates we should solve all the problems before because otherwise it will get an obstruction to release the library (which is in my opinion at least as important as the refactoring for better testability).

Woyten · Answer 11 · Sun Jan 19 2020 04:25:43 GMT+0800 (China Standard Time)

2. `libtock-new`: This will depend on `libtock-portable`, and will implement the Rust runtime on Tock as well as the "real" Tock system call trait implementation.

Why do we need a new library? The libtock-rs crate isn't even published on crates.io, so the "main" library could keep its original name.

Instead, I propose that we split libtock-rs into 3 (or more) crates:

My worries about the crate split-up is that it might slow down development. This, of course, depends on how the crates are going to be managed in detail. An important requirement, in my opinion, is that all crates reside in the same repository and can be updated, reviewed and merged simultaneously. This would require some automatization when publishing to crates.io but it seems feasible.

I'm trying to avoid using #[cfg] to switch between real and fake implementations of syscalls. I don't think #[cfg] is the right way to switch between real and fake implementations of APIs; I think dependency injection through generic args is. As such, I believe they belong in different crates.

I partially agree to this point. On the one hand, the target flag should not be used to distinguish between the real or a test implementation. On the other hand, the platform-awareness should not be visible in the drivers API. The result would be that almost every object would contain a generic type parameter, e.g. ButtonsDriver<P: Platform>, CallbackSubscription<'a, P: Platform>. This, however, is not how other platform-aware libraries (e.g. std) work.

Besides that, I have some other questions:

How is testability connected to the crate split-up? Currently, the code is not unit-testable because the mock implementation is trivial. It would be quite easy to write some SyscallsRecorder or the like which you could use for faking responses or assertions.
Is it a good idea to do the split-up very soon? Following a more agile approach, you would postpone the split-up until it's very clear why and where to do the split.
What about code optimizations? Do they work accross multiple crates as well as they do in a single crate?

Johnathan Van Why · Answer 12 · Wed Jan 22 2020 03:57:17 GMT+0800 (China Standard Time)

2. `libtock-new`: This will depend on `libtock-portable`, and will implement the Rust runtime on Tock as well as the "real" Tock system call trait implementation.
Why do we need a new library? The libtock-rs crate isn't even published on crates.io, so the "main" library could keep its original name.

Because the new libraries would be developed while the current library is in use, we need a naming convention that makes it clear whether we're referring to the existing libtock-rs or the new crate.

Instead, I propose that we split libtock-rs into 3 (or more) crates:

My worries about the crate split-up is that it might slow down development. This, of course, depends on how the crates are going to be managed in detail. An important requirement, in my opinion, is that all crates reside in the same repository and can be updated, reviewed and merged simultaneously. This would require some automatization when publishing to crates.io but it seems feasible.

I agree, they should stay in one repository.

I'm trying to avoid using #[cfg] to switch between real and fake implementations of syscalls. I don't think #[cfg] is the right way to switch between real and fake implementations of APIs; I think dependency injection through generic args is. As such, I believe they belong in different crates.

I partially agree to this point. On the one hand, the target flag should not be used to distinguish between the real or a test implementation. On the other hand, the platform-awareness should not be visible in the drivers API. The result would be that almost every object would contain a generic type parameter, e.g. ButtonsDriver<P: Platform>, CallbackSubscription<'a, P: Platform>. This, however, is not how other platform-aware libraries (e.g. std) work.

libtock-new can re-export libtock-portable's types with the platform implementation erased, i.e.:

pub type ButtonsDriver = libtock-portable::ButtonsDriver<RealSyscalls>;

Besides that, I have some other questions:

How is testability connected to the crate split-up? Currently, the code is not unit-testable because the mock implementation is trivial. It would be quite easy to write some SyscallsRecorder or the like which you could use for faking responses or assertions.

Is it a good idea to do the split-up very soon? Following a more agile approach, you would postpone the split-up until it's very clear why and where to do the split.

We need to build the fake implementations that rely on std for tests, and no include them in the real library. Splitting the crate into 3 allows us to do that cleanly; #[cfg] has messy results (see tock/tock#1393).

What about code optimizations? Do they work accross multiple crates as well as they do in a single crate?

Link-time optimization is not as effective as local optimization. However, these types will all be monomorphized in either libtock-new or in the application, so the situation is similar to a single library.

Philip Levis · Answer 13 · Sat Jan 25 2020 04:09:22 GMT+0800 (China Standard Time)

This was discussed at the Tock core call for about 20 minutes today, in part because of the interactions between libtock-rs and the kernel and a sense that it would be good to have at least similar software engineering principles.

The general consensus was that splitting into three crates is the right approach. Tock itself has taken this approach internally. When we first split the OS into multiple crates, there was pushback (mostly from me), and in retrospect it was the right decision. We discussed @torfmaster 's concerns ( #132 (comment) ) a good deal, and concluded the benefits outweighed the costs. In particular, lock-stepping releases of the three crates is effectively the same as releasing a larger crate, given that dependencies change. Concerns about how users will react seemed hypothetical and representative of only a subset of users.

This separation will also also fit well with supporting other targets (e.g., hardware emulation) in the future, and force a clear abstraction boundary.

We thought that refactoring for the purpose of testing is an excellent goal. The concern that we should have tests first is well-taken, but in my opinion there's sufficient testing experience already (e.g., @jrvanwhy 's testing for tock-on-titan, and tests generally) to inform at least at this level of design.

The group did not discuss the actual names of the crates, as this seemed like a minor detail that would be easy to get right once the exact boundaries between them are drawn.

torfmaster · Answer 14 · Sat Jan 25 2020 18:19:37 GMT+0800 (China Standard Time)

Thank you for clarifying this @phil-levis. I am sure that we have enough awareness about how splitting one library into more libraries could slow down development (e.g. publishing) and so we can take appropriate countermeasures (i.e. automation). I decided to start work in the direction of hardware emulation (if you want to call it like this): #136 to see what the actual requirements are - feel free to comment on this PR.

Johnathan Van Why · Answer 15 · Wed Feb 09 2022 10:10:24 GMT+0800 (China Standard Time)

Update: This was resolved a while ago. The rewrite described in this issue was evolved, and eventually combined with Tock 2.0 support for libtock-rs.