kerchunking zarr from OSN, bucket not found
rsignell opened this issue · comments
I must be doing something dumb here:
- I'm trying to kerchunk an existing zarr dataset from OSN
- I can succesfully open the zarr dataset from OSN with xarray
- kerchunk is complaining about not finding the same data in the bucket! why?
import fsspec
import xarray as xr
import kerchunk.combine
import kerchunk.zarr
fs_read = fsspec.filesystem('s3', anon=True, skip_instance_cache=True, use_listings_cache=False,
client_kwargs={'endpoint_url': 'https://usgs.osn.mghpcc.org'})
zarr_dataset = 'genoatest/aloarca/hindcast_unstr_med_zarr_10d_15kn/WW3_medunstr_197901.zarr'
#this works:
ds = xr.open_dataset(fs_read.get_mapper(zarr_dataset), engine='zarr')
print(ds)
# this fails:
ref1 = kerchunk.zarr.single_zarr(fs_read.get_mapper(zarr_dataset), inline=0)
with
...
ReferenceNotReachable: Reference "MAPSTA/.zarray" failed to fetch target ['s3://genoatest/aloarca/hindcast_unstr_med_zarr_10d_15kn/WW3_medunstr_197901.zarr/MAPSTA/.zarray']
but in fact that file exists:
fs_read.info('s3://genoatest/aloarca/hindcast_unstr_med_zarr_10d_15kn/WW3_medunstr_197901.zarr/MAPSTA/.zarray')
produces:
fs_read.info('s3://genoatest/aloarca/hindcast_unstr_med_zarr_10d_15kn/WW3_medunstr_197901.zarr/MAPSTA/.zarray')
{'ETag': '"5e26d87da53f93073033bf4c55634a29"',
'LastModified': datetime.datetime(2024, 4, 8, 13, 50, 26, tzinfo=tzutc()),
'size': 320,
'name': 'genoatest/aloarca/hindcast_unstr_med_zarr_10d_15kn/WW3_medunstr_197901.zarr/MAPSTA/.zarray',
'type': 'file',
'StorageClass': 'STANDARD',
'VersionId': None,
'ContentType': 'application/octet-stream'}
Notebook here: https://gist.github.com/rsignell/b6b5639afd130f4c3287c6d1a0cc265a
It seems kerchunk.zarr.single_zarr does not correctly use the storage options when you pass in a ready-made store. It does work like this, though:
storage_options = dict(anon=True, skip_instance_cache=True, use_listings_cache=False, client_kwargs={'endpoint_url': 'https://usgs.osn.mghpcc.org'})
ref1 = kerchunk.zarr.single_zarr("s3://genoatest/aloarca/hindcast_unstr_med_zarr_10d_15kn/WW3_medunstr_197901.zarr", storage_options=storage_options, inline=0)
This is awesome @martindurant !