ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to read "step" with ecCodes 2.34.0

Metamess opened this issue · comments

What happened?

With the release of ecCodes 2.34.0, support for GRIB files with sub-hourly steps has been added. As a result, the 'native type' of the 'endStep' field has become a string, while cfgrib expects an integer. 'endStep' is used by cfgrib to derive the value for the 'step' time dimension. The failure to do so due to the returned string value is silently ignored and results in Xarray Datasets lacking values for 'step' and 'valid_time'.

Relevant code: Function from_grib_step() in cfmessage.py

Related issues: #335

I believe this can be fixed by explicitly requesting "endStep" as an integer value, using the already implemented support for this in the __getitem__() of the Message class.

More detail

The way in which GRIB files encode the information regarding the time which the given data represents, can be complicated due to the variety in possible messages (it can be a point in time, or some time range, for example). EcCodes, which is the underlying framework that cfgrib uses to parse GRIB files, generates a value called "endStep" based on the information in the GRIB file. Under the hood, cfgrib uses this value as the basis for the "step" dimension, and combines it with the "reference time" (called "time" by cfgrib) to create the additional "valid_time" coordinate for the "step" dimension. This all happens when creating the index file, which is then used to navigate the GRIB file after the initial open.

Many GRIB values can be returned as the integer value that was encoded in the GRIB file, or parsed to the string value that is encoded by that integer value. EcCodes can be requested to return the value as a specific type, but will default to what it consideres the "native type" when the type is omitted. Up until now, the default type for "endStep" has been an integer, but in ecCodes 2.34.0 this has changed. When "endStep" is not represented as a value in hours, "endStep" is provided as a string value, returning the time value with the time unit attached (e.g. a forecastTime=30 and indicatorOfUnitOfTimeRange=0 (minutes) would result in an endStep of "30m").

However, cfgrib attempts to do some of its own arithmetic, combining the value for "endStep" with its unit as given by "stepUnits" to express "step" as an amount of hours. But when "endStep" is returned as a string, this code fails. The ValueError raised is caught, "step" is interpreted to be missing (it gets the value "undef"), and the result is a Dataset without "step" dimension (and consequently without "valid_time" coordinate).

One thing that makes this issue worse is that it appears that while ecCodes will convert a GRIB with a step of 60 minutes to 1 hour, returning an integer, it does not do so for a step of 0 minutes to 0 hours. While this is arguably an error on ecCodes' side, it still impacts cfgrib. An example of this is GRIB files from the German DWD's ICON model, which have full-hour steps, but encode these in minutes.

Furthermore, there is this note on the ecCodes page regarding sub-hourly support:

Note that hourly steps are currently kept without a unit to preserve compatibility with current behaviour. The plan in future will be to unify this, and give, for example, "1h" in the above case.

This implies that cfgrib should get ready to support "endStep" to be a string value by default.

What are the steps to reproduce the bug?

Download a GRIB file with step expressed in minutes, for example any GRIB files from DWD's ICON model at t=0
https://opendata.dwd.de/weather/nwp/icon/grib/00/

Then in python, open the file with cfgrib:

import cfgrib
cfgrib.open_datasets("/tmp/icon_global_icosahedral_model-level_2024022000_000_100_P.grib2")

Version

0.9.10.4

Platform (OS and architecture)

Linux a7b97a3efa9e 5.15.133.1-microsoft-standard-WSL2

Relevant log output

>>> import cfgrib
>>> cfgrib.open_datasets("/tmp/icon_global_icosahedral_model-level_2024022000_000_100_P.grib2")
ecCodes provides no latitudes/longitudes for gridType='unstructured_grid'
[<xarray.Dataset>
Dimensions:               (values: 2949120)
Coordinates:
    time                  datetime64[ns] 2024-02-20
    generalVerticalLayer  float64 100.0
Dimensions without coordinates: values
Data variables:
    pres                  (values) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             edzw
    GRIB_centreDescription:  Offenbach
    GRIB_subCentre:          255
    Conventions:             CF-1.7
    institution:             Offenbach]

Accompanying data

https://opendata.dwd.de/weather/nwp/icon/grib/00/p/icon_global_icosahedral_model-level_2024022000_000_100_P.grib2.bz2

Organisation

No response

I have created a PR that should resolve this issue. Due to the already present support in cfgrib for dealing with various units for "endStep", I believe no further changes are required at this point to be compatible with ecCodes' new sub-hourly support.

Note that requesting "endStep" as an integer does not lose any information regarding its unit, as the accompanying unit is requested as well ("stepUnits").

Also note that while ecCodes may expose "endStep" in a different unit than the one encoded in the GRIB file, the accompanying "stepUnits" value is changed accordingly, preventing any issues. To get the original time unit, "indicatorOfUnitOfTimeRange" should be read instead. To get the original time value, "forecastTime" should be read.