Losing data when reading/converting GRIB2 files to netCDF using `xarray` with `engine = 'cfgrib'`
mmgamboa opened this issue · comments
Hi all,
I have data on GRIB2 format file and I want to convert them to netCDF format. The original dataset (confirmed by using pygrib
package) has 12 messages: 6 different isobaric levels each with 2 variables (average and maximum) but when I convert the files using xarray
I miss 6 out of 12 messages.
The messages of the original file are pygrib.open('filename.grib2').read()
:
[1:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 15000 Pa:fcst time 6 hrs:from 202001080600,
2:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 15000 Pa:fcst time 6 hrs:from 202001080600,
3:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 20000 Pa:fcst time 6 hrs:from 202001080600,
4:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 20000 Pa:fcst time 6 hrs:from 202001080600,
5:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 25000 Pa:fcst time 6 hrs:from 202001080600,
6:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 25000 Pa:fcst time 6 hrs:from 202001080600,
7:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 30000 Pa:fcst time 6 hrs:from 202001080600,
8:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 30000 Pa:fcst time 6 hrs:from 202001080600,
9:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 35000 Pa:fcst time 6 hrs:from 202001080600,
10:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 35000 Pa:fcst time 6 hrs:from 202001080600,
11:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 40000 Pa:fcst time 6 hrs:from 202001080600,
12:Relative clear air turbulence (RCAT):% (instant):regular_ll:isobaricInhPa:level 40000 Pa:fcst time 6 hrs:from 202001080600]
To make the conversion I am running the following commands:
import xarray
data = xarray.open_dataset('filename.grib2', engine = 'cfgrib')
data.to_netcdf('netcdf_file.nc')
and then to read it from another file I run
import netCDF4 as nc
ds = nc.Dataset('netcdf_file.nc', engine = 'netcdf4')
In any case both data
and ds
objects have less levels (6). Here a screenshot of the data
object
Is the engine cfgrib
losing data when reading GRIB2 files? Is it possible that the problem comes from the fact that the original messages are the same for a given isobaric level?
Thanks in advance,
Martín Gamboa
Hi all,
finally I used a different engine
named pynio
. Using cfgrib
it raised a newer error (see bellow -PS). The final solution to read my file.grib2
is:
data = xr.open_dataset('file.grib2', engine='pynio')
# Take a look to the data
for v in data:
print("{}, {}, {}".format(v, data[v].attrs["long_name"], data[v].attrs["units"]))
# Extract the data from the structure
df_data = data.get([*data])
# Convert to Data Frame to easier handle (not needed if you are not interested in)
df_data = df_data.to_dataframe()
Cheers,
Martín
PS: New error
DatasetBuildError Traceback (most recent call last)
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:649, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
648 try:
--> 649 dims, data_var, coord_vars = build_variable_components(
650 var_index,
651 encode_cf,
652 filter_by_keys,
653 errors=errors,
654 squeeze=squeeze,
655 read_keys=read_keys,
656 time_dims=time_dims,
657 extra_coords=extra_coords,
658 )
659 except DatasetBuildError as ex:
660 # NOTE: When a variable has more than one value for an attribute we need to raise all
661 # the values in the file, not just the ones associated with that variable. See #54.
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:486, in build_variable_components(index, encode_cf, filter_by_keys, log, errors, squeeze, read_keys, time_dims, extra_coords)
475 def build_variable_components(
476 index: abc.Index[T.Any, abc.Field],
477 encode_cf: T.Sequence[str] = (),
(...)
484 extra_coords: T.Dict[str, str] = {},
485 ) -> T.Tuple[T.Dict[str, int], Variable, T.Dict[str, Variable]]:
--> 486 data_var_attrs = enforce_unique_attributes(index, DATA_ATTRIBUTES_KEYS, filter_by_keys)
487 grid_type_keys = GRID_TYPE_MAP.get(index.getone("gridType"), [])
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:273, in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
272 fbks.append(fbk)
--> 273 raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
274 if values and values[0] not in ("undef", "unknown"):
DatasetBuildError: multiple values for key 'typeOfLevel'
During handling of the above exception, another exception occurred:
DatasetBuildError Traceback (most recent call last)
Cell In[14], line 3
1 import xarray
----> 3 data = xarray.open_dataset(gribfile, engine = 'cfgrib')
4 #data.to_netcdf('netcdf_file.nc')
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/xarray/backends/api.py:525, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
513 decoders = _resolve_decoders_kwargs(
514 decode_cf,
515 open_backend_dataset_parameters=backend.open_dataset_parameters,
(...)
521 decode_coords=decode_coords,
522 )
524 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 525 backend_ds = backend.open_dataset(
526 filename_or_obj,
527 drop_variables=drop_variables,
528 **decoders,
529 **kwargs,
530 )
531 ds = _dataset_from_backend_dataset(
532 backend_ds,
533 filename_or_obj,
(...)
541 **kwargs,
542 )
543 return ds
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/xarray_plugin.py:109, in CfGribBackend.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
87 def open_dataset(
88 self,
89 filename_or_obj: T.Union[str, abc.MappingFieldset[T.Any, abc.Field]],
(...)
106 extra_coords: T.Dict[str, str] = {},
107 ) -> xr.Dataset:
--> 109 store = CfGribDataStore(
110 filename_or_obj,
111 indexpath=indexpath,
112 filter_by_keys=filter_by_keys,
113 read_keys=read_keys,
114 encode_cf=encode_cf,
115 squeeze=squeeze,
116 time_dims=time_dims,
117 lock=lock,
118 errors=errors,
119 extra_coords=extra_coords,
120 )
121 with xr.core.utils.close_on_error(store):
122 vars, attrs = store.load() # type: ignore
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/xarray_plugin.py:40, in CfGribDataStore.__init__(self, filename, lock, **backend_kwargs)
38 else:
39 opener = dataset.open_fieldset
---> 40 self.ds = opener(filename, **backend_kwargs)
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:780, in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
777 index_keys = compute_index_keys(time_dims, extra_coords)
778 index = open_fileindex(stream, indexpath, index_keys, filter_by_keys=filter_by_keys)
--> 780 return open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:722, in open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
715 def open_from_index(
716 index: abc.Index[T.Any, abc.Field],
717 read_keys: T.Sequence[str] = (),
(...)
720 **kwargs: T.Any,
721 ) -> Dataset:
--> 722 dimensions, variables, attributes, encoding = build_dataset_components(
723 index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
724 )
725 return Dataset(dimensions, variables, attributes, encoding)
File ~/miniconda3/envs/savipa/lib/python3.10/site-packages/cfgrib/dataset.py:670, in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
668 fbks.append(fbk)
669 error_message += "\n filter_by_keys=%r" % fbk
--> 670 raise DatasetBuildError(error_message, key, fbks)
671 short_name = data_var.attributes.get("GRIB_shortName", "paramId_%d" % param_id)
672 var_name = data_var.attributes.get("GRIB_cfVarName", "unknown")
DatasetBuildError: multiple values for unique key, try re-open the file with one of:
filter_by_keys={'typeOfLevel': 'isobaricInhPa'}
filter_by_keys={'typeOfLevel': 'maxWind'}
filter_by_keys={'typeOfLevel': 'tropopause'}
Hi @mmgamboa,
Sorry for not getting back to you sooner - this looks strange - in the description, it appears as though all the fields are of level type 'isobaricInhPa'. If this is true, then I don't see the difference between the pairs of fields (1 and 2 for instance).
However, cfgrib is complaining because it detects that there are fields with different level types, namely 'isobaricInhPa', 'maxWind' and 'tropopause'. Could this be true? I'm not familiar enough with pynio, but if you have cfgrib installed, then you will also have eccodes installed. So you can double-check the GRIB file with the command-line grib_ls <gribfile>
just to make sure of what's in there.