ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS

Home Page:https://dask-ms.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

xds_from_zarr only respects requested chunk size on first Dataset

landmanbester opened this issue · comments

  • dask-ms version: 0.2.7
  • Python version: 3.8.10
  • Operating System: ubuntu 20.04.3

Description

Writing multiple Datasets to disk and then opening with a different chunk size doesn't work as expected.
Only the first Dataset has the requested chunk size, the remainder all have the same chunk size as on disk.

What I Did

Here is a simple reproducer

import xarray as xr
import dask
import dask.array as da
from daskms.experimental.zarr import xds_to_zarr, xds_from_zarr

D = []
for i in range(5):
    tmp = da.random.random(size=(12000), chunks=1000)
    dv = {
        'DATA': ('r', tmp)
    }
    D.append(xr.Dataset(data_vars=dv))

dask.compute(xds_to_zarr(D, 'test.zarr', columns='ALL'))

xds = xds_from_zarr('test.zarr', chunks={'r': 2000})

print(xds)

which results in

[<xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(2000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>, <xarray.Dataset>
Dimensions:  (r: 12000)
Dimensions without coordinates: r
Data variables:
    DATA     (r) float64 dask.array<chunksize=(1000,), meta=np.ndarray>]

Easy enough to fix by rechunking but still think this is a bug.

This should be fixed in #182, if you want to try it out @landmanbester.

That was quick, thanks. Let me give it a go

Confirmed, it's working. Thanks