ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cfgrib generates NaNs when reading data

clessig opened this issue · comments

cfgrib generates NaNs while loading grib files, see code example below. We reproduced this on multiple machines (with slightly different configurations) and verified with grib_get_data and the old python interface that the file is free of NaNs. For your convenience, the grib of the example can be found here: http://graphics.cs.uni-magdeburg.de/misc/reanalysis_geopotential_y1979_m01_pl850.grib. However, we were able to reproduce the problem with various files (although there seems to be a correlation with the large values of geopotential compared to, e.g., vorticity).

(pyenv) [lessig1@jwlogin24 multiformer]$ python
Python 3.9.6 (default, Nov 16 2021, 12:25:13) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray
>>> fname = 'data/era5/pl_levels/850/geopotential/reanalysis_geopotential_y1979_m01_pl850.grib'
>>> ds = xarray.open_dataset( fname, engine='cfgrib')
Ignoring index file 'data/era5/pl_levels/850/geopotential/reanalysis_geopotential_y1979_m01_pl850.grib.923a8.idx' incompatible with GRIB file
>>> ds
<xarray.Dataset>
Dimensions:        (time: 744, latitude: 721, longitude: 1440)
Coordinates:
    number         int64 ...
  * time           (time) datetime64[ns] 1979-01-01 ... 1979-01-31T23:00:00
    step           timedelta64[ns] ...
    isobaricInhPa  float64 ...
  * latitude       (latitude) float64 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     (time) datetime64[ns] ...
Data variables:
    z              (time, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2022-09-08T08:06 GRIB to CDM+CF via cfgrib-0.9.1...
>>> import numpy as np
>>> data = np.array( ds['z'], dtype=np.float32)
>>> np.isnan( data).any()
True
>>> np.argwhere( np.isnan( data))
array([[   3,  123,  688],
       [   3,  140,  702],
       [   3,  638,  890],
       [  21,  121,  668],
       [  21,  142,  678],
       [  21,  631,  988],
       [  77,  599,  491],
       [  77,  606,  398],
       [  97,   55, 1206],
       [  97,   62, 1201],
       [ 140,  592, 1420],
       [ 184,  592,  755],
       [ 289,  139,  674],
       [ 289,  151,  640],
       [ 404,  608,  817],
       [ 445,  164,  671],
       [ 445,  631,  874],
       [ 519,  600,  825],
       [ 519,  641, 1114],
       [ 519,  646, 1080],
       [ 519,  656,  788],
       [ 519,  664, 1312],
       [ 555,  617,  329],
       [ 645,  587,   72],
       [ 729,  607,  189],
       [ 742,  603,  293],
       [ 742,  642,  113]])

Hi @clessig , that URL does not work for me - could you check that it's correct?
Thanks,
Iain

Hi Christian,

Thanks, it worked this time!

So, I had a suspicion about this one, checked the data using Metview, and yes - the data values contain the magic number 9999! Unfortunately this was being used as a missing value indicator (a really bad choice, but it comes from a decision made in another package many years ago). It's easily fixed, but I want to do it properly, and with tests. In the meantime, if you have your own copy of cfgrib, you can make the relevant change. It's on line 560 of dataset.py. Currently the line is:

    missing_value = data_var_attrs.get("missingValue", 9999)

Change it to:

    missing_value = data_var_attrs.get("missingValue", np.finfo(np.float32).max)

Best regards,
Iain

Hi Iain,

Many thanks for looking into this so quickly. I can find a workaround now.

Best,
Christian