New version 1.3.0 leads to "too many open files" while scanning bigger images
willem-delbare opened this issue · comments
What happened:
Images that used to scan successfully under < 1.3.0 are no longer scannable by Syft 1.3.0
failed to run task: unable to catalog file digests: failed to process file "": digests-cataloger unable to observe contents of *****: open /tmp/stereoscope-/oci-registry-image-/sha256:.tar: too many open files
What you expected to happen:
Scan finishes without crash
Steps to reproduce the issue:
Scan any (bigger) image related to ML/AI (eg https://hub.docker.com/r/pytorch/pytorch/tags).
Environment:
- Output of
syft version
: 1.3.0 - OS (e.g:
cat /etc/os-release
or similar): Linux
Thanks very much for the report @willem-delbare!
Pull #2823 should fix the issue - on my machine, it reduces the number of simultaneous file descriptors needed by syft to scan pytorch/pytorch:latest
by ~50%.
I'm also working to put better linting in place, so that we're less likely to leak file handles in the future. #2825 is the start of this. Issue #2826 tracks adding additional linting so that we don't have one of these leaks again.