[bug] Started getting `Too many open files (os error 24)` on `fs:read` after upgrading 1.20.2 -> 1.23.4 on MacOS
iyefrat opened this issue · comments
Describe the bug
Started getting Too many open files (os error 24)
after upgrading 1.20.2 -> 1.23.4 on MacOS, on a pnpm monorepo (can't share it here, sorry).
From bisecting the issue, it seems to not occur on 1.23.0 and start occurring on 1.23.1
Steps to reproduce
Not sure. Have a lot of files in your moon cache maybe?
Expected behavior
not getting the error
Screenshots
example of the error:
Error: fs::read
× Failed to read path ~/monorepo/.moon/cache/states/project/task.
╰─▶ Too many open files (os error 24)
Environment
System:
OS: macOS 14.2.1
CPU: (8) arm64 Apple M1 Pro
Memory: 77.09 MB / 16.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 20.10.0 - ~/.proto/bin/node
npm: 10.2.3 - /opt/homebrew/opt/node@18/bin/npm
bun: 1.1.0 - /opt/homebrew/bin/bun
Managers:
Homebrew: 4.2.16 - /opt/homebrew/bin/brew
pip3: 23.3.1 - /opt/homebrew/bin/pip3
RubyGems: 3.0.3.1 - /usr/bin/gem
Utilities:
CMake: 3.28.3 - /opt/homebrew/bin/cmake
Make: 3.81 - /usr/bin/make
GCC: 15.0.0 - /usr/bin/gcc
Git: 2.43.1 - /opt/homebrew/bin/git
Clang: 15.0.0 - /usr/bin/clang
Curl: 8.4.0 - /usr/bin/curl
Servers:
Apache: 2.4.56 - /usr/sbin/apachectl
Virtualization:
Docker: 25.0.3 - /usr/local/bin/docker
Docker Compose: 2.24.6 - /usr/local/bin/docker-compose
IDEs:
Emacs: 29.2 - /opt/homebrew/bin/emacs
VSCode: 1.84.0 - /opt/homebrew/bin/code
Vim: 9.0 - /usr/bin/vim
WebStorm: 2022.2
Xcode: /undefined - /usr/bin/xcodebuild
Languages:
Bash: 3.2.57 - /bin/bash
Java: 18.0.2 - /usr/bin/javac
Perl: 5.30.3 - /usr/bin/perl
Python3: 3.11.7 - /opt/homebrew/bin/python3
Ruby: 2.6.10 - /usr/bin/ruby
Databases:
SQLite: 3.43.2 - /usr/bin/sqlite3
Browsers:
Chrome: 123.0.6312.107
Safari: 17.2.1
Additional context
This can be solved by raising the default macos ulimit
, but since it's a new error I'd hope there's a way to get it to work without that
@iyefrat Does this happen with all commands or just running a task?
Edit: After looking at the commits, we did fix the auto-clean mechanism. So this may be clean trying to read the metadata of many files? If you run moon clean
manually does it error? And if you delete .moon/cache
does the error go away?
- it doesn't happen on all commands, or all tasks, just some tasks. it's not entirely deterministic
- i do get the error when running
moon clean
, consistently - when deleting
.moon/cache
i canmoon check --all
to run for a while but it eventually fails on the error, after which other tasks seem to fail more consistently. seems that this error is more likely to happen the more cache files you have.
running du
in .moon/cache
at this point leads to:
6136 ./hashes
(... states subdirectories ...)
7536 ./states
8416 ./outputs
22104 .
Ok that's helpful, then it definitely seems like the cleaning. Let me rework it a bit so that it doesn't read metadata of these files.
This is actually a bit tricky. I may have to remove this functionality, or wrap it in a setting or something.
Are these large cache files by chance? Or just a ton of small ones?
I've made a few changes that will reduce the amount of syscalls, but not 100% this will fix the problem. I'll pull these into a patch and look into a bigger fix for the next release:
Ok I landed those in 1.24, I also added a new setting to control this so you can turn it off if its still an issue.
Are these large cache files by chance? Or just a ton of small ones?
I get this with a du .
of 11960 and fd . | wc -l
of 1050 (on 1.23.4).
Updating to 1.24.1 has fixed the problem. Thanks!