Intake 2.0.0: ValueError: storage_options passed with non-fsspec path
observingClouds opened this issue · comments
I am trying to open an intake catalog that previously (prior to intake release 2.0.0) did not cause any issues. I know that intake 2 is currently in beta and I could pin an older version of intake, but I just wanted to raise this issue. I couldn't find any documentation on whether this could still be a valid intake 2 catalog (and should be compatible) or has to be adapted.
>>> import intake
>>> cat = intake.open_catalog("https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/subcatalogs/conus404-catalog.yml")
>>> cat['conus404-hourly-osn']
sources:
conus404-hourly-osn:
args:
consolidated: true
storage_options:
anon: true
client_kwargs:
endpoint_url: https://usgs.osn.mghpcc.org/
requester_pays: false
urlpath: s3://hytest/conus404/conus404_hourly.zarr
description: 'CONUS404 Hydro Variable subset, 40 years of hourly values. These
files were created wrfout model output files (see ScienceBase data release for
more details: https://www.sciencebase.gov/catalog/item/6372cd09d34ed907bf6c6ab1).
You can work with this data for free in any environment (there are no egress
fees).'
driver: intake_xarray.xzarr.ZarrSource
metadata:
catalog_dir: https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/subcatalogs
>>> cat['conus404-hourly-osn'].to_dask()
With intake==2.0.0
:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/intake_xarray/base.py", line 69, in to_dask
return self.read_chunked()
^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/intake_xarray/base.py", line 44, in read_chunked
self._load_metadata()
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/intake/source/base.py", line 84, in _load_metadata
self._schema = self._get_schema()
^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/intake_xarray/base.py", line 18, in _get_schema
self._open_dataset()
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/intake_xarray/xzarr.py", line 46, in _open_dataset
self._ds = xr.open_dataset(self.urlpath, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/api.py", line 572, in open_dataset
backend_ds = backend.open_dataset(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/zarr.py", line 1011, in open_dataset
store = ZarrStore.open_group(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/zarr.py", line 464, in open_group
zarr_group = zarr.open_consolidated(store, **open_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/zarr/convenience.py", line 1334, in open_consolidated
store = normalize_store_arg(
^^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/zarr/storage.py", line 197, in normalize_store_arg
return normalize_store(store, storage_options, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/haukeschulz/mambaforge/envs/test/lib/python3.12/site-packages/zarr/storage.py", line 169, in _normalize_store_arg_v2
raise ValueError("storage_options passed with non-fsspec path")
ValueError: storage_options passed with non-fsspec path
Opening the dataset with xarray or zarr directly was not an issue:
xr.open_zarr("s3://hytest/conus404/conus404_hourly.zarr", storage_options={'anon':True, 'requester_pays':False, 'client_kwargs':{'endpoint_url':'https://usgs.osn.mghpcc.org'}})
#782 I believe fixes this - if you would confirm, I would appreciate it.
For reference, here is how you would build the entry in the new way:
import intake
data = intake.datatypes.Zarr("s3://hytest/conus404/conus404_hourly.zarr", storage_options={"anon": True, "endpoint_url": "https://usgs.osn.mghpcc.org/"}, metadata={"description": "CONUS404 Hydro Variable subset, 40 years of hourly values"})
reader = data.to_reader("xarray", consolidated=False)
cat = intake.readers.entry.Catalog()
cat["conus404-hourly-osn"] = reader
cat.to_yaml_file("cat.yaml")
producing
aliases:
conus404-hourly-osn: conus404-hourly-osn
data:
95ffa5d13fb47748:
datatype: intake.readers.datatypes:Zarr
kwargs:
root: ''
storage_options:
anon: true
endpoint_url: https://usgs.osn.mghpcc.org/
url: s3://hytest/conus404/conus404_hourly.zarr
metadata:
description: CONUS404 Hydro Variable subset, 40 years of hourly values
user_parameters: {}
entries:
conus404-hourly-osn:
kwargs:
consolidated: false
data: '{data(95ffa5d13fb47748)}'
metadata:
description: CONUS404 Hydro Variable subset, 40 years of hourly values
output_instance: xarray:Dataset
reader: intake.readers.readers:XArrayDatasetReader
user_parameters: {}
metadata: {}
user_parameters: {}
version: 2
(the perfectly valid alternative, of course, is to pin intake<2.0)
Thanks @martindurant for the quick response and the fix. It works with the current HEAD!
I appreciate also your additional documentation.