cfgrib generates NaNs when reading data
clessig opened this issue · comments
cfgrib generates NaNs while loading grib files, see code example below. We reproduced this on multiple machines (with slightly different configurations) and verified with grib_get_data and the old python interface that the file is free of NaNs. For your convenience, the grib of the example can be found here: http://graphics.cs.uni-magdeburg.de/misc/reanalysis_geopotential_y1979_m01_pl850.grib. However, we were able to reproduce the problem with various files (although there seems to be a correlation with the large values of geopotential compared to, e.g., vorticity).
(pyenv) [lessig1@jwlogin24 multiformer]$ python
Python 3.9.6 (default, Nov 16 2021, 12:25:13)
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray
>>> fname = 'data/era5/pl_levels/850/geopotential/reanalysis_geopotential_y1979_m01_pl850.grib'
>>> ds = xarray.open_dataset( fname, engine='cfgrib')
Ignoring index file 'data/era5/pl_levels/850/geopotential/reanalysis_geopotential_y1979_m01_pl850.grib.923a8.idx' incompatible with GRIB file
>>> ds
<xarray.Dataset>
Dimensions: (time: 744, latitude: 721, longitude: 1440)
Coordinates:
number int64 ...
* time (time) datetime64[ns] 1979-01-01 ... 1979-01-31T23:00:00
step timedelta64[ns] ...
isobaricInhPa float64 ...
* latitude (latitude) float64 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
* longitude (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
valid_time (time) datetime64[ns] ...
Data variables:
z (time, latitude, longitude) float32 ...
Attributes:
GRIB_edition: 1
GRIB_centre: ecmf
GRIB_centreDescription: European Centre for Medium-Range Weather Forecasts
GRIB_subCentre: 0
Conventions: CF-1.7
institution: European Centre for Medium-Range Weather Forecasts
history: 2022-09-08T08:06 GRIB to CDM+CF via cfgrib-0.9.1...
>>> import numpy as np
>>> data = np.array( ds['z'], dtype=np.float32)
>>> np.isnan( data).any()
True
>>> np.argwhere( np.isnan( data))
array([[ 3, 123, 688],
[ 3, 140, 702],
[ 3, 638, 890],
[ 21, 121, 668],
[ 21, 142, 678],
[ 21, 631, 988],
[ 77, 599, 491],
[ 77, 606, 398],
[ 97, 55, 1206],
[ 97, 62, 1201],
[ 140, 592, 1420],
[ 184, 592, 755],
[ 289, 139, 674],
[ 289, 151, 640],
[ 404, 608, 817],
[ 445, 164, 671],
[ 445, 631, 874],
[ 519, 600, 825],
[ 519, 641, 1114],
[ 519, 646, 1080],
[ 519, 656, 788],
[ 519, 664, 1312],
[ 555, 617, 329],
[ 645, 587, 72],
[ 729, 607, 189],
[ 742, 603, 293],
[ 742, 642, 113]])
Hi @clessig , that URL does not work for me - could you check that it's correct?
Thanks,
Iain
Hi Iain,
Please try again: http://graphics.cs.uni-magdeburg.de/misc/reanalysis_geopotential_y1979_m01_pl850.grib
Thanks,
Christian
Hi Christian,
Thanks, it worked this time!
So, I had a suspicion about this one, checked the data using Metview, and yes - the data values contain the magic number 9999! Unfortunately this was being used as a missing value indicator (a really bad choice, but it comes from a decision made in another package many years ago). It's easily fixed, but I want to do it properly, and with tests. In the meantime, if you have your own copy of cfgrib, you can make the relevant change. It's on line 560 of dataset.py. Currently the line is:
missing_value = data_var_attrs.get("missingValue", 9999)
Change it to:
missing_value = data_var_attrs.get("missingValue", np.finfo(np.float32).max)
Best regards,
Iain
Hi Iain,
Many thanks for looking into this so quickly. I can find a workaround now.
Best,
Christian