bazelbuild / rules_nodejs

NodeJS toolchain for Bazel.

Home Page:https://bazelbuild.github.io/rules_nodejs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: `npm_install` modifying the source tree during the build is an antipattern

joeleba opened this issue · comments

What happened?

tl,dr: I think modifying the source dir during the build is an antipattern.

Background

I ran into an issue while trying to build bazelbuild/buildtools with --experimental_merged_skyframe_analysis_and_execution (context: Project Skymeld bazelbuild/bazel#14057). Some actions were reporting missing inputs from the node_modules directory.

Without going into too much details, I managed to figure out the issue: compared to the regular version of blaze, Skymeld clears the syscall cache at a different point in time.

Take the following example:

WORKSPACE
BUILD
foo/
  a.in
  b.in

where WORKSPACE contains an instance of npm_install.

Assume that there are actions that refer to node_modules. When doing bazel build //foo/... with bazel at HEAD, this is what happens (chronological order):

Phase What happens State of syscall cache (readdir)
Loading npm_install in the WORKSPACE file is evaluated and node_modules is generated { foo -> [a.in, b.in]}
Analysis Prepare the actions cleared at the end of the analysis phase to save memory
Execution Execute the actions. Will refer to the syscall cache to get the file states. { foo -> [a.in, b.in, node_modules] }

The syscall cache is extra-Skyframe and generally assumes that the source doesn't change during the build (code). This node_modules directory is generated in the source dir and hence isn't considered "external" either.

So what happens if we don't clear the syscall cache after the analysis phase? One would expect a memory regression, but in this case, it's worse: we'll start seeing missing input errors:

Phase What happens State of syscall cache (readdir)
Loading npm_install in the WORKSPACE file is evaluated and node_modules is generated { foo -> [a.in, b.in]}
Analysis Prepare the actions No cache clearance
Execution Execute the actions. Will refer to the syscall cache to get the file states and reports that there's no node_modules directory { foo -> [a.in, b.in] }

Disabling the cache clearance: bazelbuild/bazel@54b4c39

Proposal

The syscall cache was there only for performance reason, but it accidentally tolerated this behavior. This will not be the case when we switch over to Skymeld mode, because then the syscall cache clearance happens at a different time.

IMO we have 2 options to move forward:

  1. Clear the syscall cache once after the Loading phase, or
  2. Reaffirm that bazel rules shouldn't modify the source dir during the build. This essentially means that we revert #63.

I believe that (1) is just once more sweeping this problem under the rug and isn't a desirable option, not to mention the potential regression from clearing the syscall cache before the analysis work.

WDYT?

Version

Development (host) and target OS/architectures: Linux

Output of bazel --version: development version

Version of rules_nodejs, or other relevant rules from your
WORKSPACE or MODULE.bazel file: 4.0.0

Language(s) and/or frameworks involved:

How to reproduce

Build bazel from this branch: https://github.com/joeleba/bazel/tree/npm_install_repro


$ cd /tmp
$ git clone git@github.com:joeleba/bazel.git
$ cd bazel
$ bazel build //src:bazel && cp bazel-bin/src/bazel /tmp/bazel -f

Clone bazelbuild/buildtools and build ...:

$ cd /tmp
$ git clone git@github.com:bazelbuild/buildtools.git`
$ cd buildtools
$ /tmp/bazel build //...


### Any other information?

_No response_
commented

Writing to the source tree during the build (even from repository modules) is definitely an antipattern. However, that does not answer the question as to what else to do; my understanding is that the node_modules directory is pretty much required by the Node ecosystem. There might be some leeway as to where one puts the WORKSPACE.bazel file, but I wouldn't count on it.

Historically, the answer to this was managed_directories(), which was used to tell Bazel which parts of the source tree it should not consider as "sources", but it was deprecated because supporting it required a lot of complexity (see bazelbuild/bazel#15463 and the linked design doc).

@joeleba , how does this exactly cause trouble? Disregarding the inherent ugliness of this, this would only be a problem if the build touched the node_modules/ directory or called readdir() on the top-level directory of the workspace.

@alexeagle , can you offer some guidance here? My ideas (I'm not sure what is my order of preference):

  1. Use the little-known .bazelignore functionality to ignore node_modules/ (this is similar to managed directories, but more limited). If it's really, truly necessary I'd sign off on adding bazelignore-like functionality to the WORKSPACE.bazel / MODULE.bazel file (essentially resurrecting managed directories, but in a much more limited way)
  2. Make rules_nodejs error out when node_modules does not exist, thus at least the problem with top-level readdir() is avoided.

@joeleba , how does this exactly cause trouble? Disregarding the inherent ugliness of this, this would only be a problem if the build touched the node_modules/ directory or called readdir() on the top-level directory of the workspace.

In bazelbuild/buildtools, there's a tsc target that contains a middleman action that refers to files under node_modules. In skymeld mode, or when we don't clear the syscall cache after analysis, the syscall cache would not know of the existence of node_modules because of a stale cache entry. Here's a snippet of the failure:

...
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: missing input file '@buildozer_npm_deps//:node_modules/typescript/loc/lcl/TRK/TypeScriptLanguageService/Microsoft.CodeAnalysis.TypeScript.EditorFeatures.dll.lcl'
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: missing input file '@buildozer_npm_deps//:node_modules/typescript/loc/lcl/TRK/TypeScriptTasks/TypeScript.Tasks.dll.lcl'
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: missing input file '@buildozer_npm_deps//:node_modules/typescript/package.json'
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14: Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: 169 input file(s) do not exist
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /usr/local/google/home/leba/.cache/bazel/_bazel_leba/2d48da7e351519dedcec3731f196c474/external/buildozer_npm_deps/typescript/bin/BUILD.bazel:9:14 Middleman _middlemen/external_Sbuildozer_Unpm_Udeps_Stypescript_Sbin_Stsc.sh-runfiles failed: 169 input file(s) do not exist
INFO: Elapsed time: 15.659s, Critical Path: 0.78s
INFO: 132 processes: 115 internal, 17 local.
ERROR: Build did NOT complete successfully

Since you mentioned the possibility of adding stuff to MODULE.bazel: cc @meteorcloudy

Since you mentioned the possibility of adding stuff to MODULE.bazel

I think I mentioned managed_directories isn't available in Bzlmod, I don't think Bzlmod can actually help with this issue.

commented

managed_directories isn't a feature of Bazel anymore :)

But if we want to add a .bazelignore analogue, one would have to add that extra symbol.

Since you mentioned the possibility of adding stuff to MODULE.bazel: cc @meteorcloudy

FWIW, "you" here meant @lberki.

From reading the doc, we don't seem to have this problem with rules_js because the node_modules dir isn't under the source tree.

@alexeagle could you please confirm?

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs in 30 days. Note as of rules_nodejs v6 the rules_nodejs repository contains only the core nodejs toolchain and @bazel/runfiles package. All rulesets are removed and unmaintained. For alternate rulesets suggestions include https://github.com/aspect-build/rules_js, https://github.com/aspect-build/rules_ts Collaborators can add a "cleanup" or "need: discussion" label to keep it open indefinitely. Thanks for your contributions to rules_nodejs!

Hey sorry we never replied @joeleba - that's correct that rules_js writes only to the output tree. rules_nodejs 6.0.0 fixed this, in the sense that this package manager code no longer exists, the scope is now only to provide a nodejs toolchain.