evmar / n2

n2 ("into"), a ninja compatible build system

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Native support for remote-execution / caching

gkousik opened this issue · comments

I'm opening this issue as a discussion to get opinion on whether there is interest in natively supporting remote-execution / remote-caching (through an open source protocol like https://github.com/bazelbuild/remote-apis).

With Ninja, the way I've seen most projects use remote-execution is by adding a wrapper to the command line that hijacks the actual action execution and then do remote-execution. While this model works, it has been cumbersome with some problems:

  1. A binary has to wrap all action command lines (e.g., ./<wrapper-binary> <wrapper-binary-args> -- clang++ .... This makes locally reproducing an action failure more complicated (you have to remove this wrapper binary and its arguments to run the action locally). It can also be confusing at times to see the wrapper binary in the command line, especially for new users.

  2. Skipping intermediate action results of the graph execution becomes very tricky. E.g., when executing a graph that contains a set of compile actions that result in a link action (all of which can be run and cached on a remote compiler farm), some users (or CI systems) would want to skip the intermediate object files since they aren't read locally and would only want to download the output of the link action. This becomes complicated to do in a remote-execution system implemented with wrappers since the individual wrappers don't have sufficient knowledge of the overall buildgraph to effectively skip downloading the output of some actions.

So the question is - is there interest in natively supporting such the remote-apis protocol in N2?

Sounds cool! I don't have a use for it myself but I'd be happy to review or advise on any appropriate changes if they aren't too invasive.

One thing to be aware of is that n2 currently wants to read the mtimes of intermediate files to determine whether they're out of date. This might be more complex if you intend to leave the intermediate objects cached remotely. It's all just code of course, and fixable.

We plan to implement this in the android fork of n2 as well.

We'd like to implement action sandboxing (ideally using nsjail) before remote execution, as that will allow us to easily find all dependency issues that would be a blocker for RE. And if an action worked in a sandbox, there would be no extra work for it to also work in RE.

Note that sandboxing and RE are incompatible with depfiles, you wouldn't know what files to upload in that case.

We'd also like to try switching to file-hash-based manifests, initially just to cut down on unnecessary rebuilds locally, but eventually also to integrate with a hash-based remote fileystem. (ABFS)

Note that skipping intermediate actions may not be trivial, I don't think the bazel remote execution APIs are set up well for that, hence why bazel doesn't support it either. Though we are interested in that as well.

Edit: Oh just realized it's Kousik :) In that case, you'd probably just want to add this functionality to the android fork of n2, as the multithreaded parsing change has stirred up the internal datastructures a bit.

Thanks - its good to know that there's interest!

One thing to be aware of is that n2 currently wants to read the mtimes of intermediate files to determine whether they're out of date. This might be more complex if you intend to leave the intermediate objects cached remotely. It's all just code of course, and fixable.

This is a challenge yep. I've seen some attempts at this do a fake file on disk to satisfy the mtime check, while others maintain this information with a custom log file.

Note that skipping intermediate actions may not be trivial, I don't think the bazel remote execution APIs are set up well for that, hence why bazel doesn't support it either. Though we are interested in that as well.

I think Bazel does support Build without the bytes? https://blog.bazel.build/2023/10/06/bwob-in-bazel-7.html

In that case, you'd probably just want to add this functionality to the android fork of n2, as the multithreaded parsing change has stirred up the internal datastructures a bit.

Ah interesting.. I will checkout Android's N2 fork!