Netflix / metaflow

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Home Page:https://metaflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Does Metaflow support multiple S3 buckets?

jaskaran-virdi-imprivata opened this issue · comments

From the documentation - https://docs.metaflow.org/scaling/data and looking at the ~/.metaflowconfig/config.json,
metaflow uses a single bucket - "METAFLOW_DATASTORE_SYSROOT_S3":"s3://<bucket_name>/metaflow","METAFLOW_DATATOOLS_S3ROOT":"s3://<bucket_name>/data"

Is it possible, to have metaflow data stored across multiple buckets? For example: 5 customers, 5 buckets. Each customer/tenant has all their data(metaflow artifacts like models, data) in their specific bucket.

Based on the doc that you linked, you should be able to do that by passing the bucket/prefix to S3 and use customer name to construct the bucket. Doesn't that work for your use case?

Note that metaflow.S3 provides a default S3 location for storing data. You could change the location by defining S3(bucket='my-bucket', prefix='/my/prefix') for the constructor. Metaflow versioning information would be concatenated to the prefix.

Thanks! I already resolved this issue.