Netflix / metaflow

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Home Page:https://metaflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Conda environment being treated as disabled, and not appending environment to PATH.

saikonen opened this issue · comments

Encountered this issue with Conda packages that install binaries, where the binary was successfully installed, but during step execution it is not found. The underlying reason is that in some cases conda_decorator does not successfully append the environmment to PATH.

Example flows for reproducing the issue:

test case 1: FAILS

from metaflow import FlowSpec, step, conda_base,conda
import sys
import os

libraries = {
    "dbt-postgres": ">=1.7.0",
}
@conda_base(python="3.11", packages=libraries)
class CondaPathTest(FlowSpec):

    @step
    def start(self):
        print(f"path is: {os.environ.get('PATH')}")
        import dbt  # test that the library imports successfully
        print(dbt)
        from metaflow.util import which

        # Test that the binary is also found.
        # Only true if the environment/bin was added to PATH.
        if not which("dbt"):
            raise Exception("did not find dbt")

        print(f"python version: {sys.version_info}")
        self.next(self.end)

    @step
    def end(self):
        print(f"python version: {sys.version_info}")

if __name__ == "__main__":
    CondaPathTest()

test case 2: SUCCEEDS

from metaflow import FlowSpec, step, conda
import sys
import os

libraries = {
    "dbt-postgres": ">=1.7.0",
}
class CondaPathTest(FlowSpec):

    @conda(python="3.11", packages=libraries)
    @step
    def start(self):
        print(f"path is: {os.environ.get('PATH')}")
        import dbt  # test that the library imports successfully
        print(dbt)
        from metaflow.util import which

        # Test that the binary is also found.
        # Only true if the environment/bin was added to PATH.
        if not which("dbt"):
            raise Exception("did not find dbt")

        print(f"python version: {sys.version_info}")
        self.next(self.end)

    @step
    def end(self):
        print(f"python version: {sys.version_info}")

if __name__ == "__main__":
    CondaPathTest()

Notes

In the failing flow, the failure happens with which, where it fails to locate the installed binary. The library is still able to be imported, because the python interpreter is correctly set to the conda environment one.

The underlying cause for all of this is that unless we explicitly specify a conda decorator for a step, the environment is treated as disabled, and will fail all env setup processes.