microsoft / farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

Home Page:https://microsoft.github.io/farmvibes-ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Download always fails in sentinel/spectral_indices notebook

gregcode123 opened this issue · comments

Hello team:

I'm trying the sentinel/spectral_indices notebook, and after trying for more than 5 times, the process, after 3 hours or so, always stuck at the download step... (downloading the image from the sentinel). I'm not sure what's the error is and if it is always the same as there is no other message that status failed in the monitor() prints.
Is there a way to check the errors?
Is it possible to enable a 'verbose' option to check where the download fails?
Can the task be re-submitted?

Thanks
Greg

Hi, @gregcode123. Thanks for raising this issue.

If the download operation is taking too long and/or failing, it might be a sign that you do not have enough disk space available in your FarmVibes.AI storage. Could you check the available disk space with a df -kh on the storage partition, and share the output with me?

Now to your questions:

Is there a way to check the errors?

A way to understand what went wrong with a failed workflow run is to check the reason attribute of the run object. For example:

run = client.run(....)
print(run.reason)

Please, could you share the result with us?


Is it possible to enable a 'verbose' option to check where the download fails?

Currently we do not have a verbose option for monitoring workflow run progress. That is a good suggestion that I'll take to the team.


Can the task be re-submitted?

You can resubmit with the client.resubmit_run() and run.resubmit() :

# with the client
run_id = ... # get the id of a run that failed
client.resubmit_run(run_id)

# or directly with the run object
run.resubmit()

Hello rafaspadilha:

Thanks for the answers.
Here is a screenshot for both requests;

image

I have to mention that last time I installed the framework (around June/23 before the latest commits), everything worked fine for me. This time, I wasn't able to download a single image, so that's why I posted this issue.

Looking forward to you comments.

Thanks
Greg

Thanks for providing the screenshot. A few more questions:

  1. Is your storage located in / ?
  2. How big is your time range and geometry? You seem to have 180Gb (if the storage is in /). Depending on the size of your geometry and time range, 180Gb might not be enough to download the data.
  3. You mentioned that were able to download an image with a previous version of FarmVibes. Is this problem happening whenever you run this workflow? Or are you able to download other small geometries/time ranges?

Did you try running the workflow after a restart (by running farmvibes-ai local restart ) and/or on a new cluster (destroying the cluster with farmvibes-ai local destroy and rebuilding it with farmvibes-ai local setup)?

Hi, @gregcode123, are you able to provide an update on this?

Thank you for the reply. I'll close the issue.