Netflix / metaflow

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Home Page:https://metaflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

running jobs with conda gives OSError: Invalid data stream

raybellwaves opened this issue · comments

Tried to run a metaflow job today using @conda_base(libraries=dependencies, python="3.10.13") and I saw:

2023-12-21 14:35:12.171 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Downloading code package...
2023-12-21 14:35:12.945 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Code package downloaded.
2023-12-21 14:35:12.966 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Task is starting.
2023-12-21 14:35:13.265 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Bootstrapping virtual environment...
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Bootstrap failed while executing: set -e;
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]         if ! command -v micromamba >/dev/null 2>&1; then
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]             mkdir micromamba;
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]             python -c "import requests, bz2, sys; data = requests.get('https://micro.mamba.pm/api/micromamba/linux-64/latest').content; sys.stdout.buffer.write(bz2.decompress(data))" | tar -xv -C $(pwd)/micromamba bin/micromamba --strip-components 1;
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]             export PATH=$PATH:$(pwd)/micromamba;
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]             if ! command -v micromamba >/dev/null 2>&1; then
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]                 echo "Failed to install Micromamba!";
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]                 exit 1;
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]             fi;
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]         fi
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Stdout:
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] Stderr: Traceback (most recent call last):
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]   File "<string>", line 1, in <module>
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]   File "/usr/local/lib/python3.10/bz2.py", line 333, in decompress
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx]     res = decomp.decompress(data)
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] OSError: Invalid data stream
2023-12-21 14:35:16.607 [XXXXX/flow/XXXXXX (pid 82514)] [pod xxx] tar: This does not look like a tar archive

I have the same situation. I'm using @pypi for managing my libraries.

And seems that not related to code... it started. to happen on tested and working flows.

curl https://micro.mamba.pm/api/micromamba/linux-64/latest
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111(/opt/userenvs/X.X/test_env) X.X@X:~$ curl -L https://micro.mamba.pm/api/micromamba/linux-64/latest
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: " to save to a file.

f"curl -Ls https://micro.mamba.pm/api/micromamba/{platform}/latest | tar -xvj -C {installation_location} bin/micromamba",

curl -L https://micro.mamba.pm/api/micromamba/linux-64/latest -v works now. closing