ecmwf / eccodes-python

Python interface to the ecCodes GRIB/BUFR decoder/encoder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unexpected behavior if not all GRIB messages in a file are read

strandgren opened this issue · comments

Hi,

I use the eccodes and python-eccodes libraries to read grib encoded satellite data from EUMETSAT. While looping over a set of files I noticed an issue where information from the previously opened grib file was sometimes retained and used, leading to conflicts in e.g. data dimensions. The problem occur if I do not read all grib messages, but break out from the while loop as soon as I find the message with the correct parameter number. This can be solved by looking at all grib messages, i.e. not breaking out from the loop until the message id is None. Is this an expected behavior? I would expect that it should possible to only read parts of the file and not have any left over information carried over to a subsequent eccodes instance.

Below is an example where I loop over four satellite data files, for which I want to read a given message/parameter number. To track the observed issue I print the number of messages in the file using codes_count_in_file()

import eccodes as ec

# dictionary with the following information:  <grib filename>: <parameter number to read>    # number of messages in grib file
files = {
    'CTHEncProd_20211102161500Z_00_OMPEFS01_MET08_FES_E0415': 2,     # 2 grib messages
    'OCAEncProd_20211102161500Z_00_OMPEFS01_MET08_FES_E0415': 30,    # 12 grib messages
    'CTHEncProd_20211102120000Z_00_OMPEFS04_MET11_FES_E0000': 2,     # 2 grib messages
    'OCAEncProd_20211102150000Z_00_OMPEFS04_MET11_FES_E0000': 30,    # 12 grib messages
}

i = 0
while i < 4:  # loop over the files four times in order to identify any changes in behavior
    for fname, pnum in files.items():
        ec.codes_grib_multi_support_on()

        with open(fname, 'rb') as fh:
            print('Number of messages:', ec.codes_count_in_file(fh))

            while True:
                gid = ec.codes_grib_new_from_file(fh)

                if gid is None:
                    # Reached end of file, break out of loop
                    break

                parameter_number = ec.codes_get(gid, 'parameterNumber')
                if parameter_number == pnum:
                    # Fond correct parameter number, load data
                    data = ec.codes_get_values(gid)
                    ec.codes_release(gid)
                    break                                                      #  !!!  If this break command is deleted, the code works ok  !!!
                else:
                    # The parameter number is not the correct one, continue to next message
                    ec.codes_release(gid)
    i += 1
    print('')

The code snippet above gives the following output

Number of messages: 2
Number of messages: 13
Number of messages: 2
Number of messages: 22

Number of messages: 3
Number of messages: 22
Number of messages: 3
Number of messages: 22

Number of messages: 3
Number of messages: 22
Number of messages: 3
Number of messages: 22

Number of messages: 3
Number of messages: 22
Number of messages: 3
Number of messages: 22

It's clear that the number of messages obtained using codes_count_in_file() doesn't correspond to the actual grib file content. If I remove the break command after loading the data and releasing the message id gid (line 30), i.e. if I stay in the while loop until I reach the end of the grib file, the output looks like this:

Number of messages: 2
Number of messages: 12
Number of messages: 2
Number of messages: 12

Number of messages: 2
Number of messages: 12
Number of messages: 2
Number of messages: 12

Number of messages: 2
Number of messages: 12
Number of messages: 2
Number of messages: 12

Number of messages: 2
Number of messages: 12
Number of messages: 2
Number of messages: 12

which is in line with the number of messages in the files.


Test setup:

Linux
python               3.8.12
eccodes               2.23.0
python-eccodes      2021.05.1 

The codes_count_in_files() checks whether the flag "multi-support" and if that is ON, it gets all the message handles and counts those. Otherwise it counts by looking at the message boundary i.e. GRIB identifier and 7777 at the end