observingClouds / slkspec

fsspec filesystem for stronglink tape archive

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Combining retrievals from different subdirectories

observingClouds opened this issue · comments

Currently, retrievals from different subdirectories are not combined. Initially this was done because an older version of slk did not create subdirectories locally. This has been changed in slk to allow more efficient retrievals. slkspec should reflect these changes, so that independent of the directory, retrievals can be merged.

The following code can be used as a test:

import xarray as xr
xr.open_mfdataset("slk:///arch/mh0010/m300408/showcase/dataset.zarr", engine="zarr")

Currently the output looks like the following:

# /scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/.zmetadata
# /scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/time/0
# slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/time","$max_depth":1}},{"resources.name":{"$regex":"0"}}]}'
# /scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/time/0
# /scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/lat/0
# slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/lat","$max_depth":1}},{"resources.name":{"$regex":"0"}}]}'
# /scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/lon/0
# slk search '{"$and":[{"path":{"$gte":"/arch/mh0010/m300408/showcase/dataset.zarr/lon","$max_depth":1}},{"resources.name":{"$regex":"0"}}]}'
# /scratch/m/m300408/arch/mh0010/m300408/showcase/dataset.zarr/time/0

Same here, see my coment in #10 .

No, we just need to get rid of this loop then it should work (except for.zmetadata):

for output_dir, inp_files in retrieval_requests.items():

According to @naumannd slk supports retrievals across directories now.