pangeo-forge / pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Home Page:https://pangeo-forge.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement adapter for `OpenWithKerchunk | OpenWithXarray`

cisaacstern opened this issue · comments

In theory, OpenWithKerchunk should be able to provide inputs for OpenWithXarray, which can help address issues such as #361. IIUC a version of this existed in 0.9.4 (or at least was in development there). Discussion in leap-stc/cmip6-leap-feedstock#16 (comment) reminded me that this would be a useful thing to implement (or re-implement, as the case may be).

This seems like a great idea!

As far as design. OpenWithKerchunk returns a PCollection of references in memory. It seems like there would need to be either:

  1. An additional PTransform to convert the PCollection of references to a PCollection of fsspec mappers that could be read by OpenWithXarray?
  2. An option within OpenWithKerchunk that returns fsspec mapppers.

Any thoughts here @cisaacstern?

Good questions, @norlandrhagen.

An option within OpenWithKerchunk that returns fsspec mapppers.

I think I'd lean towards this option. The downside this that it introduces multiple return types into OpenWithKerchunk, but the benefit is it keeps the user-facing API simpler.

another option would be to have OpenWithXarray use an engine (xr.open_dataset backend) that immediately knows what to do with the references (the "kerchunk" engine discussed in fsspec/kerchunk#360?)

That would be the best way!

@keewis are you working on that PR/issue or do you know if there is any development on it?

I'm not working on this nor am I planning to at the moment (and I'm not aware of anyone else doing so), but the development will most likely happen on the kerchunk repo.

Thanks for mentioning this @keewis.

Whichever solution we choose here, let's link fsspec/kerchunk#360 in a comment, and mention that the implementation here is a shim until that issue is resolved.