moonrepo / moon

A build system and monorepo management tool for the web ecosystem, written in Rust.

Home Page:https://moonrepo.dev/moon

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bug] Started getting `Too many open files (os error 24)` on `fs:read` after upgrading 1.20.2 -> 1.23.4 on MacOS

iyefrat opened this issue · comments

Describe the bug

Started getting Too many open files (os error 24) after upgrading 1.20.2 -> 1.23.4 on MacOS, on a pnpm monorepo (can't share it here, sorry).

From bisecting the issue, it seems to not occur on 1.23.0 and start occurring on 1.23.1

Steps to reproduce

Not sure. Have a lot of files in your moon cache maybe?

Expected behavior

not getting the error

Screenshots

example of the error:

Error: fs::read
× Failed to read path ~/monorepo/.moon/cache/states/project/task.
╰─▶ Too many open files (os error 24)

Environment


  System:
    OS: macOS 14.2.1
    CPU: (8) arm64 Apple M1 Pro
    Memory: 77.09 MB / 16.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.10.0 - ~/.proto/bin/node
    npm: 10.2.3 - /opt/homebrew/opt/node@18/bin/npm
    bun: 1.1.0 - /opt/homebrew/bin/bun
  Managers:
    Homebrew: 4.2.16 - /opt/homebrew/bin/brew
    pip3: 23.3.1 - /opt/homebrew/bin/pip3
    RubyGems: 3.0.3.1 - /usr/bin/gem
  Utilities:
    CMake: 3.28.3 - /opt/homebrew/bin/cmake
    Make: 3.81 - /usr/bin/make
    GCC: 15.0.0 - /usr/bin/gcc
    Git: 2.43.1 - /opt/homebrew/bin/git
    Clang: 15.0.0 - /usr/bin/clang
    Curl: 8.4.0 - /usr/bin/curl
  Servers:
    Apache: 2.4.56 - /usr/sbin/apachectl
  Virtualization:
    Docker: 25.0.3 - /usr/local/bin/docker
    Docker Compose: 2.24.6 - /usr/local/bin/docker-compose
  IDEs:
    Emacs: 29.2 - /opt/homebrew/bin/emacs
    VSCode: 1.84.0 - /opt/homebrew/bin/code
    Vim: 9.0 - /usr/bin/vim
    WebStorm: 2022.2
    Xcode: /undefined - /usr/bin/xcodebuild
  Languages:
    Bash: 3.2.57 - /bin/bash
    Java: 18.0.2 - /usr/bin/javac
    Perl: 5.30.3 - /usr/bin/perl
    Python3: 3.11.7 - /opt/homebrew/bin/python3
    Ruby: 2.6.10 - /usr/bin/ruby
  Databases:
    SQLite: 3.43.2 - /usr/bin/sqlite3
  Browsers:
    Chrome: 123.0.6312.107
    Safari: 17.2.1

Additional context
This can be solved by raising the default macos ulimit, but since it's a new error I'd hope there's a way to get it to work without that

@iyefrat Does this happen with all commands or just running a task?

Edit: After looking at the commits, we did fix the auto-clean mechanism. So this may be clean trying to read the metadata of many files? If you run moon clean manually does it error? And if you delete .moon/cache does the error go away?

  1. it doesn't happen on all commands, or all tasks, just some tasks. it's not entirely deterministic
  2. i do get the error when running moon clean, consistently
  3. when deleting .moon/cache i can moon check --all to run for a while but it eventually fails on the error, after which other tasks seem to fail more consistently. seems that this error is more likely to happen the more cache files you have.

running du in .moon/cache at this point leads to:

6136	./hashes
(... states subdirectories ...)
7536	./states
8416	./outputs
22104	.

Ok that's helpful, then it definitely seems like the cleaning. Let me rework it a bit so that it doesn't read metadata of these files.

This is actually a bit tricky. I may have to remove this functionality, or wrap it in a setting or something.

Are these large cache files by chance? Or just a ton of small ones?

I've made a few changes that will reduce the amount of syscalls, but not 100% this will fix the problem. I'll pull these into a patch and look into a bigger fix for the next release:

Ok I landed those in 1.24, I also added a new setting to control this so you can turn it off if its still an issue.

Are these large cache files by chance? Or just a ton of small ones?

I get this with a du . of 11960 and fd . | wc -l of 1050 (on 1.23.4).

Updating to 1.24.1 has fixed the problem. Thanks!