xds_from_zarr only respects requested chunk size on first Dataset
landmanbester opened this issue · comments
Landman Bester commented
- dask-ms version: 0.2.7
- Python version: 3.8.10
- Operating System: ubuntu 20.04.3
Description
Writing multiple Datasets to disk and then opening with a different chunk size doesn't work as expected.
Only the first Dataset has the requested chunk size, the remainder all have the same chunk size as on disk.
What I Did
Here is a simple reproducer
import xarray as xr
import dask
import dask.array as da
from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr
D = []
for i in range(5):
tmp = da.random.random(size=(12000), chunks=1000)
dv = {
'DATA': ('r', tmp)
}
D.append(xr.Dataset(data_vars=dv))
dask.compute(xds_to_zarr(D, 'test.zarr', columns='ALL'))
xds = xds_from_zarr('test.zarr', chunks={'r': 2000})
print(xds)
which results in
[<xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(2000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions: (r: 12000)
Dimensions without coordinates: r
Data variables:
DATA (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>]
Easy enough to fix by rechunking but still think this is a bug.
JSKenyon commented
This should be fixed in #182, if you want to try it out @landmanbester.
Landman Bester commented
That was quick, thanks. Let me give it a go
Landman Bester commented
Confirmed, it's working. Thanks