ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS

Home Page:https://dask-ms.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Expose on disk chunks

landmanbester opened this issue · comments

This is not a bug but a feature request. I noted that, at least for the experimental zarr functions, the on disk chunking info is being discarded here

dask_ms_attrs = group_attrs.pop(DASKMS_ATTR_KEY)

This is actually pretty useful information (for instance if you need to rechunk before a write). Is there a good reason for discarding it? If so can it be exposed somehow?

It is possible to expose it. What may make more sense is to make the writes a little more sophisticated (bearing in mind that I haven't studied them completely yet), and handle the rechunking for you. Basically, if a group already exists at the location you want to write to, rechunk the xds to be consistent with the on-disk chunks. Which would you prefer?

It would be great to do this automagically. I can't think of any other uses for the on disk chunking info so that would probably suffice. It may also be a different way to get around #171. Although you shouldn't really need to rechunk things you are not writing to disk