anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New version 1.3.0 leads to "too many open files" while scanning bigger images

willem-delbare opened this issue · comments

What happened:
Images that used to scan successfully under < 1.3.0 are no longer scannable by Syft 1.3.0

failed to run task: unable to catalog file digests: failed to process file "": digests-cataloger unable to observe contents of *****: open /tmp/stereoscope-/oci-registry-image-/sha256:.tar: too many open files

What you expected to happen:
Scan finishes without crash

Steps to reproduce the issue:
Scan any (bigger) image related to ML/AI (eg https://hub.docker.com/r/pytorch/pytorch/tags).

Environment:

  • Output of syft version: 1.3.0
  • OS (e.g: cat /etc/os-release or similar): Linux

Thanks very much for the report @willem-delbare!

Pull #2823 should fix the issue - on my machine, it reduces the number of simultaneous file descriptors needed by syft to scan pytorch/pytorch:latest by ~50%.

I'm also working to put better linting in place, so that we're less likely to leak file handles in the future. #2825 is the start of this. Issue #2826 tracks adding additional linting so that we don't have one of these leaks again.