Netflix / metaflow

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Home Page:https://metaflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing artifact in the step.task.data._artifacts

super-shayan opened this issue · comments

Hello, I am using Metaflow and scheduling parallel jobs on AWS Step function. My Flow script is as follows:

start() -> run() -> join() -> end()
in the start() i am using foreach to call run() in parallel. I tried using the Client API to access the data in the run() step, as follows:

flow = Flow('MyFlow')
step = Step('MyFlow/sfn-id/run')
list(step.tasks())#there are multiple of these available, i choose one below:
step.task#is something like: MyFlow/sfn-id/run/task-id
step.task.data#gives the output: <MetaflowData: >

Since I have attached a variable called "self.results" to the run() step, I expected to access the results by calling the following:
step.task.data.results, but this raises a keyError:

File /anaconda3/lib/python3.11/site-packages/metaflow/client/core.py:738, in MetaflowData.__getattr__(self, name)
    737 def __getattr__(self, name: str):
--> 738     return self._artifacts[name].data

KeyError: 'results'

I also tried to see what artifacts are there by calling:

step.task.data._artifacts
but that returns a Null set

correct - you might be running into this expected behavior. The link has details on how to correctly move state through a foreach. Let me know if that works!