Building a process binary for two different platforms with the same target triple causes a race condition.
jrvanwhy opened this issue · comments
Question: If you try to build process binaries with two different layouts concurrently, does it result in a race condition? I'm primarily concerned with the layout file selection in libtock_runtime
's build.rs
.
If the answer is yes, then we should investigate how to fix the race.
This question becomes more important if we add functionality to build.rs
to support building apps for multiple slots.
It's very hard to understand what cargo
does by reading its code, so I cloned it and added print statements to the file lock and unlock functions. I've determined that:
- When building multiple crates,
cargo check
,cargo build
, andcargo run
locks the build directory before it starts building the first crate, and does not unlock it until it is finished with the last crate. This means that compiling crates is not racy. cargo run
unlocks the build directory before it executes the binary (or invokes the runner if one is provided), socargo run
invocations can race.
For the record, I used cargo
commit 07e9d46e3
and the following diff:
diff --git a/src/cargo/util/flock.rs b/src/cargo/util/flock.rs
index 755bcdcd9..bb099e6ff 100644
--- a/src/cargo/util/flock.rs
+++ b/src/cargo/util/flock.rs
@@ -349,23 +349,38 @@ mod sys {
use std::os::unix::io::AsRawFd;
pub(super) fn lock_shared(file: &File) -> Result<()> {
- flock(file, libc::LOCK_SH)
+ println!("START: Instance {:?} lock_shared {:?}", std::env::var("INSTANCE"), file);
+ let out = flock(file, libc::LOCK_SH);
+ println!("END: Instance {:?} lock_shared {:?}", std::env::var("INSTANCE"), file);
+ out
}
pub(super) fn lock_exclusive(file: &File) -> Result<()> {
- flock(file, libc::LOCK_EX)
+ println!("START: Instance {:?} lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+ let out = flock(file, libc::LOCK_EX);
+ println!("END: Instance {:?} lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+ out
}
pub(super) fn try_lock_shared(file: &File) -> Result<()> {
- flock(file, libc::LOCK_SH | libc::LOCK_NB)
+ println!("START: Instance {:?} try_lock_shared {:?}", std::env::var("INSTANCE"), file);
+ let out = flock(file, libc::LOCK_SH | libc::LOCK_NB);
+ println!("END: Instance {:?} try_lock_shared {:?}", std::env::var("INSTANCE"), file);
+ out
}
pub(super) fn try_lock_exclusive(file: &File) -> Result<()> {
- flock(file, libc::LOCK_EX | libc::LOCK_NB)
+ println!("START: Instance {:?} try_lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+ let out = flock(file, libc::LOCK_EX | libc::LOCK_NB);
+ println!("END: Instance {:?} try_lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+ out
}
pub(super) fn unlock(file: &File) -> Result<()> {
- flock(file, libc::LOCK_UN)
+ println!("START: Instance {:?} unlock {:?}", std::env::var("INSTANCE"), file);
+ let out = flock(file, libc::LOCK_UN);
+ println!("END: Instance {:?} unlock {:?}", std::env::var("INSTANCE"), file);
+ out
}
pub(super) fn error_contended(err: &Error) -> bool {
So the current status is that building for different platforms with the same target triple causes races.
I'm trying to think through whether it is possible to solve this race condition, and if so, how.
First off: we only have a race if cargo run
is executed concurrently with another cargo
command that builds the same binary for the same --target
. cargo build
invocations cannot race with each other because they always hold locks through the entire operation.
If cargo run
is invoked concurrently with cargo run
for the same binary and --target
, the following interleaving can happen:
LIBTOCK_PLATFORM=A cargo run
acquires the lock, builds the ELF for platform A, then releases the lock.LIBTOCK_PLATFORM=B cargo run
acquires the lock, builds the ELF for platform B, then releases the lock.LIBTOCK_PLATFORM=A cargo run
invokes therunner
binary, which invokeself2tab
on the ELF file.
This results in the LIBTOCK_PLATFORM=A cargo run
invocation creating a TAB file build for platform B, which is incorrect. I do not see how we can defend against this, except to avoid running cargo run
concurrently for different platforms.
So at the minimum, we should document that cargo run
invocations for different platforms can race, and tell users to not do that.
Additionally, there are a couple ways we could prevent this race for libtock-rs
's Makefile
:
- Use a mutex (e.g. via GNU Parallel's
sem
binary) to prevent concurrentcargo run
invocations. - Pass a different
--target-dir=
value for each platform.
Of course option 3 (document + ignore) exists as well.
- Pass a different --target-dir= value for each platform.
This is effectively what the libtock-c build system does. There the requisite granularity is core type, (i.e. separate builds for cortex-m0, -m3, -m4, -m7, and each flavor of risc-v), Most of the compilation for any app is the libtock library, so the ecosystem sees reasonably appreciable benefit from sharing platform build directories / re-using libtock library objects.
Reproducible builds are the other important part of the puzzle there, as it doesn't really matter if parallel build processes compile the same library file for a given platform in a racy way, as they'll both produce the same thing.*
I very intentionally set up CI to stress test this—the build_all.sh
script will build as many examples in parallel as there are cores on the machine. It seems to be working pretty well.
*Caveated on built object files being written / updated atomically such that another build process wouldn't try to read a partially written file; which compilers seem sensible on in practice]
Note that in the context of #482, this means passing a different --target-dir
value for each location (flash address and RAM address) you build the app for.
Sigh, yep, that's how we end up with this list of "targets" https://github.com/tock/libtock-c/blob/432746cd7df195a6122d0d3f0e88a16720d5376d/Configuration.mk#L80-L94
I (optimistically?) view the spurious builds per-location as a transient hack that can be removed when relocations land.
ePIC makes everything better for sure.
sigh
cargo build
...locks the build directory before it starts building the first crate, and does not unlock it until it is finished with the last crate
Which directory? /target
or something more nested?
cargo build
...locks the build directory before it starts building the first crate, and does not unlock it until it is finished with the last crateWhich directory?
/target
or something more nested?
I don't recall, and I don't have that cargo
dev setup around anymore so it's not trivial to check. I suspect it is per-architecture and per-profile based on
jrvanwhy@jrvanwhy:~/libtock-rs/target$ find . -iname '*lock*'
./release/.cargo-lock
./riscv32imc-unknown-none-elf/release/.cargo-lock
./thumbv6m-none-eabi/release/.cargo-lock
./thumbv7em-none-eabi/release/.cargo-lock
If it's worth putting more effort into answering your question, let me know and I will recreate my test setup and give you a confident answer.
Well is it possible that cargo could build two elfs at once but for different target triples?
Well is it possible that cargo could build two elfs at once but for different target triples?
A quick test shows that it is possible to build two ELFs at once if one uses the release profile and one uses the debug profile (whether or not they are different target triples). So far, I haven't been able to get it to build two release ELFs simultaneously, even with different target triples.
At this point, I believe we've fixed all race conditions that are worth fixing. The remaining race conditions I'm aware of are:
- Invoking multiple
make
commands in parallel can race - Invoking
make
concurrently withcargo run
can race rustup
can potentially race with itself anytime
Issues 1 and 2 are pretty much baked into the designs of make
and cargo run
, while issue 3 is tracked upstream in rust-lang/rustup#988.