microsoft / farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Home Page:https://microsoft.github.io/farmvibes-ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ability to view and edit workflows

iharshulhan opened this issue · comments

Is there an option to view the actual source code of the workflows? And is there a document describing the available parameters and the outputs types?

It would be very beneficial to understand the inner workings of the workflows and be able to modify them for a particular use case. As en example, at the moment spaceeye pipeline takes up huge amount of space and optimising assets handling (deleting processed tiles, saving only ROI instead of complete assets, etc.) should help to reduce that.

Hi, Ihar. Thanks for opening the issue and using FarmVibes.AI. Answering your questions:

It is possible to view the workflow components, adapt to specific needs, and compose new workflows (the Sentinel notebook has a few examples). For workflow information (including all the stages) and parameters can be viewed through the client by running, the command:

client.document_workflow("data_processing/spaceeye/spaceeye") 

a full list of Workflows can be found here. For workflows, we are planning on adding the ability to visualize the YAML file describing the graph. Exposing the code for the Operations (the compute unit in a workflow stage) is not currently in the plans.

For the specific behavior on SpaceEye you are mentioning we would need to modify the behavior of a few operations themselves. That is work already being done by the team, and we expect to have this modified in our next release. This should already decrease the amount of storage by almost 50%. On the ask for processing only the ROI, I believe this could be an option we add to the same operation, but I would need to align with the team and get back to you.

In the current architecture, asset handling is not performed on a per-workflow level, but across workflows and we are planning on providing a data management API in the platform, where assets associated with previous workflow runs are deleted and creating policies for when to delete assets. This data management layer is currently a work in progress, with an ETA in the next couple of months.

Would love to understand your specific use case better and understand how to unblock you, let me know if you are able to join a call.

@iharshulhan, with #71 , we have added the ability to retrieve the yaml file specifying our workflows in the client and REST API. In the client, you can access it via:

client.get_workflow_yaml(workflow_name)

with workflow_name as the name of the workflow of interest (e.g., helloworld, data_ingestion/cdl/download_cdl).

In the REST API, you can retrieve it through the endpoint v0/workflows/{workflow_name}?return_format=yaml.

For more information on how workflow yamls are specified and how to compose custom workflows, please refer to WORKFLOWS.md and the example notebooks grouped under the "Working with Custom Workflows" topic.

Additionally, we have improved Sentinel-2 preprocessing steps and the workflow should now consume half the previous disk space and be twice as fast. Be mindful that the cache for the new operations is not compatible with the cache from previous versions.

I will close this issue for now, but fell free to reopen it or open a new issue if you want to discuss further.

@lonnes , this is interesting , we are looking forward to the data management layer, hopefully it can export workflows & notebook outputs via an API