pangeo-forge / pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Home Page:https://pangeo-forge.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graceful Openers for Connection Timeouts?

ranchodeluxe opened this issue · comments

Problem

I have .nc4 inputs on s3 and I've noticed when trying to ETL archives >= ~5k range (the whole archive is about ~225k) that the function open_with_kerchunk might bubble any number of sublassed botocore.exceptions.ConnectionError exceptions (such as connection timeouts) that are expected during a lot of network interaction. It seems like open_with_xarray might be prone to the same issue.

Failures like these for single inputs have the consequence of failing the whole pipeline

Possible Solution

Is there some way to catch these connection errors across fsspec targets, log the problem inputs and make sure downstream transforms gracefully continue processing?

Closing this for now b/c skipping doesn't seem to work since downstream workflows produce index errors assuming those skipped timesteps need to be there. Next ideas:

  • The s3 buckets I'm using have auth gateways. They might possibly be the problem
  • See how checkpoints in beam/flink work
  • If all else fails we might need to catch errors and create bunk records that we can then reprocess