Losing data when reading/converting GRIB2 files to netCDF using `xarray` with `engine = 'cfgrib'`

Question

Losing data when reading/converting GRIB2 files to netCDF using `xarray` with `engine = 'cfgrib'`

mmgamboa opened this issue a year ago · comments

Hi all,

I have data on GRIB2 format file and I want to convert them to netCDF format. The original dataset (confirmed by using pygrib package) has 12 messages: 6 different isobaric levels each with 2 variables (average and maximum) but when I convert the files using xarray I miss 6 out of 12 messages.

The messages of the original file are pygrib.open('filename.grib2').read():

[1:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 15000 Pa:fcst time 6 hrs:from 202001080600,
 2:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 15000 Pa:fcst time 6 hrs:from 202001080600,
 3:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 20000 Pa:fcst time 6 hrs:from 202001080600,
 4:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 20000 Pa:fcst time 6 hrs:from 202001080600,
 5:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 25000 Pa:fcst time 6 hrs:from 202001080600,
 6:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 25000 Pa:fcst time 6 hrs:from 202001080600,
 7:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 30000 Pa:fcst time 6 hrs:from 202001080600,
 8:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 30000 Pa:fcst time 6 hrs:from 202001080600,
 9:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 35000 Pa:fcst time 6 hrs:from 202001080600,
 10:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 35000 Pa:fcst time 6 hrs:from 202001080600,
 11:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 40000 Pa:fcst time 6 hrs:from 202001080600,
 12:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 40000 Pa:fcst time 6 hrs:from 202001080600]

To make the conversion I am running the following commands:

import xarray

data = xarray.open_dataset('filename.grib2', engine = 'cfgrib')
data.to_netcdf('netcdf_file.nc')

and then to read it from another file I run

import netCDF4 as nc
ds = nc.Dataset('netcdf_file.nc', engine = 'netcdf4')

In any case both data and ds objects have less levels (6). Here a screenshot of the data object

Is the engine cfgrib losing data when reading GRIB2 files? Is it possible that the problem comes from the fact that the original messages are the same for a given isobaric level?

Thanks in advance,
Martín Gamboa

mmgamboa · Answer 1 · Wed Jun 21 2023 23:09:11 GMT+0800 (China Standard Time)

Hi all,

finally I used a different engine named pynio. Using cfgrib it raised a newer error (see bellow -PS). The final solution to read my file.grib2 is:

data = xr.open_dataset('file.grib2', engine='pynio')

# Take a look to the data
for v in data:
    print("{}, {}, {}".format(v, data[v].attrs["long_name"], data[v].attrs["units"]))

# Extract the data from the structure
df_data = data.get([*data])

# Convert to Data Frame to easier handle (not needed if you are not interested in)
df_data = df_data.to_dataframe()

Cheers,
Martín

PS: New error

DatasetBuildError                         Traceback (most recent call last)
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:649, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    648 try:
--> 649     dims, data_var, coord_vars = build_variable_components(
    650         var_index,
    651         encode_cf,
    652         filter_by_keys,
    653         errors=errors,
    654         squeeze=squeeze,
    655         read_keys=read_keys,
    656         time_dims=time_dims,
    657         extra_coords=extra_coords,
    658     )
    659 except DatasetBuildError as ex:
    660     # NOTE: When a variable has more than one value for an attribute we need to raise all
    661     #   the values in the file, not just the ones associated with that variable. See #54.

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:486, in build_variable_components(index, encode_cf, filter_by_keys, log, errors, squeeze, read_keys, time_dims, extra_coords)
    475 def build_variable_components(
    476     index: abc.Index[T.Any, abc.Field],
    477     encode_cf: T.Sequence[str] = (),
   (...)
    484     extra_coords: T.Dict[str, str] = {},
    485 ) -> T.Tuple[T.Dict[str, int], Variable, T.Dict[str, Variable]]:
--> 486     data_var_attrs = enforce_unique_attributes(index, DATA_ATTRIBUTES_KEYS, filter_by_keys)
    487     grid_type_keys = GRID_TYPE_MAP.get(index.getone("gridType"), [])

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:273, in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
    272         fbks.append(fbk)
--> 273     raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
    274 if values and values[0] not in ("undef", "unknown"):

DatasetBuildError: multiple values for key 'typeOfLevel'

During handling of the above exception, another exception occurred:

DatasetBuildError                         Traceback (most recent call last)
Cell In[14], line 3
      1 import xarray
----> 3 data = xarray.open_dataset(gribfile, engine = 'cfgrib')
      4 #data.to_netcdf('netcdf_file.nc')

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/xarray/backends/api.py:525, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    513 decoders = _resolve_decoders_kwargs(
    514     decode_cf,
    515     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    521     decode_coords=decode_coords,
    522 )
    524 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 525 backend_ds = backend.open_dataset(
    526     filename_or_obj,
    527     drop_variables=drop_variables,
    528     **decoders,
    529     **kwargs,
    530 )
    531 ds = _dataset_from_backend_dataset(
    532     backend_ds,
    533     filename_or_obj,
   (...)
    541     **kwargs,
    542 )
    543 return ds

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/xarray_plugin.py:109, in CfGribBackend.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
     87 def open_dataset(
     88     self,
     89     filename_or_obj: T.Union[str, abc.MappingFieldset[T.Any, abc.Field]],
   (...)
    106     extra_coords: T.Dict[str, str] = {},
    107 ) -> xr.Dataset:
--> 109     store = CfGribDataStore(
    110         filename_or_obj,
    111         indexpath=indexpath,
    112         filter_by_keys=filter_by_keys,
    113         read_keys=read_keys,
    114         encode_cf=encode_cf,
    115         squeeze=squeeze,
    116         time_dims=time_dims,
    117         lock=lock,
    118         errors=errors,
    119         extra_coords=extra_coords,
    120     )
    121     with xr.core.utils.close_on_error(store):
    122         vars, attrs = store.load()  # type: ignore

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/xarray_plugin.py:40, in CfGribDataStore.__init__(self, filename, lock, **backend_kwargs)
     38 else:
     39     opener = dataset.open_fieldset
---> 40 self.ds = opener(filename, **backend_kwargs)

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:780, in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
    777 index_keys = compute_index_keys(time_dims, extra_coords)
    778 index = open_fileindex(stream, indexpath, index_keys, filter_by_keys=filter_by_keys)
--> 780 return open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:722, in open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
    715 def open_from_index(
    716     index: abc.Index[T.Any, abc.Field],
    717     read_keys: T.Sequence[str] = (),
   (...)
    720     **kwargs: T.Any,
    721 ) -> Dataset:
--> 722     dimensions, variables, attributes, encoding = build_dataset_components(
    723         index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
    724     )
    725     return Dataset(dimensions, variables, attributes, encoding)

File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:670, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    668         fbks.append(fbk)
    669         error_message += "\n    filter_by_keys=%r" % fbk
--> 670     raise DatasetBuildError(error_message, key, fbks)
    671 short_name = data_var.attributes.get("GRIB_shortName", "paramId_%d" % param_id)
    672 var_name = data_var.attributes.get("GRIB_cfVarName", "unknown")

DatasetBuildError: multiple values for unique key, try re-open the file with one of:
    filter_by_keys={'typeOfLevel': 'isobaricInhPa'}
    filter_by_keys={'typeOfLevel': 'maxWind'}
    filter_by_keys={'typeOfLevel': 'tropopause'}

Iain Russell · Answer 2 · Fri Jun 30 2023 22:17:46 GMT+0800 (China Standard Time)

Hi @mmgamboa,

Sorry for not getting back to you sooner - this looks strange - in the description, it appears as though all the fields are of level type 'isobaricInhPa'. If this is true, then I don't see the difference between the pairs of fields (1 and 2 for instance).

However, cfgrib is complaining because it detects that there are fields with different level types, namely 'isobaricInhPa', 'maxWind' and 'tropopause'. Could this be true? I'm not familiar enough with pynio, but if you have cfgrib installed, then you will also have eccodes installed. So you can double-check the GRIB file with the command-line grib_ls <gribfile> just to make sure of what's in there.