ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nws noaa grib files open_file but fail to convert OnDiskArray to numpy array (cfgrib >= 0.9.10.2)

ghaarsma opened this issue · comments

What happened?

When downloading, opening [open_file] and getting the data values from NWS NOAA grib files, all works perfect under cfgrib 0.9.10.1. Any newer cfgrib version, the file still opens, but getting a numpy array from the OnDiskArray fails in the newly (0.9.10.2) added function get_values_in_order.

What are the steps to reproduce the bug?

#  Testing cfgrib on NWS NOAA grib files ----------------------------------------------------------------
import requests
import cfgrib
import sys

print(f"cfgrid version: {cfgrib.__version__}, Python: {sys.version}")
for var in ["waveh", "wdir", "wgust", "wspd", "wwa"]:
    with requests.get(f'https://tgftp.nws.noaa.gov/SL.us008001/ST.opnl/DF.gr2/DC.ndfd/AR.oceanic/VP.001-003/ds.{var}.bin', stream=True) as r:
        with open(f'ds.{var}.bin', 'wb') as f:
            f.write(r.content)

    print(f"downloading {var} data")
    # Open file directly and return cfgrib dataset
    ds = cfgrib.open_file(f'ds.{var}.bin', indexpath="")
    variables = ds.variables.keys()
    # The last variable is the expected data variable
    data_var = list(variables)[-1]
    # Get the last variable values as numpy array
    values = ds.variables[data_var].data[:, :]
    print(f"Data var: {data_var} has shape: {values.shape}")

Version

0.9.10.2, 0.9.10.3, & 0.9.10.4

Platform (OS and architecture)

Python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]

Relevant log output

C:\Users\PythonProjects\degrib\venv\Scripts\python.exe
C:\Users\PythonProjects\degrib\gom_forecast.py 
C:\Users\PythonProjects\degrib\venv\Lib\site-packages\gribapi\__init__.py:23: UserWarning: ecCodes 2.31.0 or higher is recommended. You are running version 2.27.0
  warnings.warn(
cfgrid version: 0.9.10.1, Python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
downloading waveh data
Data var: shww has shape: (21, 4512981)
downloading wdir data
Data var: wdir10 has shape: (21, 4512981)
downloading wgust data
Data var: gust has shape: (21, 4512981)
downloading wspd data
Data var: si10 has shape: (21, 4512981)
downloading wwa data
Data var: unknown has shape: (21, 4512981)

Process finished with exit code 0
__________________________________________________________________________________________________

C:\Users\PythonProjects\degrib\venv\Scripts\python.exe
C:\Users\PythonProjects\degrib\gom_forecast.py 
C:\Users\PythonProjects\degrib\venv\Lib\site-packages\gribapi\__init__.py:23: UserWarning: ecCodes 2.31.0 or higher is recommended. You are running version 2.27.0
  warnings.warn(
cfgrid version: 0.9.10.2, Python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
downloading waveh data
Traceback (most recent call last):
  File "C:\Users\PythonProjects\degrib\gom_forecast.py", line 19, in <module>
    values = ds.variables[data_var].data[:, :]
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "C:\Users\PythonProjects\degrib\venv\Lib\site-packages\cfgrib\dataset.py", line 355, in __getitem__
    values = get_values_in_order(message, array_field[tuple(array_field_indexes)].shape)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\PythonProjects\degrib\venv\Lib\site-packages\cfgrib\dataset.py", line 314, in get_values_in_order
    values[1::2, :] = values[1::2, ::-1]
                      ~~~~~~^^^^^^^^^^^^
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Process finished with exit code 1

Accompanying data

https://tgftp.nws.noaa.gov/SL.us008001/ST.opnl/DF.gr2/DC.ndfd/AR.oceanic/VP.001-003/ds.waveh.bin

Organisation

No response

I checked the downloaded files:
ds.wwa.bin
ds.wspd.bin
ds.wdir.bin
ds.wgust.bin
ds.waveh.bin

and all use WMO encodings for their parameters except for ds.wwa.bin which uses a LOCAL encoding i.e.,
discipline=0
parameterCategory=19
parameterNumber=217

and this is not recognised by ecCodes and therefore the shortName key has the "unknown" value

Thank you @shahramn, for your troubleshooting so far.

Yes, I agree that the wwa.bin file has an "unknown" data variable (as seen in the logs).
However, In cfgrib 0.9.10.1 all 5 files properly extract the 2-dim Data array. All cfgrib later versions (0.9.10.2, 0.9.10.3, & 0.9.10.4) fail to extract the data array for all five files.

I have done a little bit of debugging and it seems that for all 5 NWS NOAA files, inside dataset.py line 316: the code section message.get("alternativeRowScanning", False) returns 1 (True), which results in the crash on line 318.

Hope this helps

Hi @ghaarsma,

I believe I've fixed the issue (thanks for the report, it was a particular case that we had not previously encountered - alternativeRowScanning in a Mercator grid). Are you able to test my branch locally, or do you need a new release of cfgrib?

Cheers,
Iain

Hi @iainrussell,

I can confirm that the branch fix/alternate-scanning-mercator fixes the problem after some local testing. Thank you for the quick fix. Looking forward to a new release of cfgrib, so we can roll it into production.