🐛[BUG]: Chunking feature of open_forecasts is not available

Question

🐛[BUG]: Chunking feature of open_forecasts is not available

jdwillard19 opened this issue 4 months ago · comments

Jared Willard commented 4 months ago

Version

source

On which installation method(s) does this occur?

Source

Describe the issue

When calling earth2mip.datasets.hindcast.open_forecast() I expected the chunks argument to function as it would it an xarray.open_zarr(). But after checking the code it doesn't look like this parameter is used. E.g.

gpu_id = int(os.environ.get('SLURM_LOCALID', '0'))
device = f'cuda:{gpu_id}'
model = get_model(config_tmp['weather_model'], registry, device=device)
time = datetime.datetime(2018, 1, 1, 0)
initial_times = [time + datetime.timedelta(hours=12 * i) for i in range(730)]
datasource = hdf5.DataSource.from_path(
    root=h5_folder, channel_names=model.channel_names
)
time_mean = np.load('/pscratch/sd/p/pharring/73var-6hourly/staging/stats/time_means.npy')
config_path = './config_swin_depth12_chweight_inv_8step.json'
output_path = '/pscratch/sd/j/jwillard/FCN_exp/wb2/swin_73var_geo_depth12_chweight_invar_8step/'
with open(config_path) as f:
    config_geo_chw_8step = json.load(f)
config = EnsembleRun.parse_obj(config_geo_chw_8step)
n_shards = 4
shard = int(os.environ.get('SLURM_LOCALID', '0'))
run_over_initial_times(time_loop=model, data_source=datasource, 
                    initial_times=initial_times, 
                    config=config, output_path=output_path, 
                    shard=shard,n_shards=n_shards)
                    
                        
model_forecast_dir = output_path+config_tmp['weather_model']+"/"
chunks = {
    'initial_time': 10,  
    'time': 42,         
    'lat': 90,           
    'lon': 180      
}
ds = open_forecast(model_forecast_dir, group="mean.zarr",chunks=chunks)

Seems to cause OOM errors because the chunking is not happening.

Environment details

No response

Noah D. Brenowitz · Answer 1 · Tue Apr 02 2024 11:54:23 GMT+0800 (China Standard Time)

Thanks for opening this issue.

How about this solution?

ds = open_forecast(model_forecast_dir, group="mean.zarr").chunk(chunks)

Noah D. Brenowitz · Answer 2 · Sun Apr 14 2024 00:25:00 GMT+0800 (China Standard Time)

We could add this to the docstring. I agree the xarray user typically expects a daskified experience when opening a zarr.