ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

memory leak -- cfgrib.open_datasets() not releasing memory.

mahrsee1997 opened this issue · comments

Lets us take this file.

Observe that the memory has not been cleared even after deleting the ds.

Code:

import os
import psutil
import cfgrib

from memory_profiler import profile

@profile
def main():
    path = 'nam.t00z.afwaca36.tm00.grib2'
    print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
    ds = cfgrib.open_datasets(path)
    del ds
    print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

if __name__ == '__main__':
    print(f"Start: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
    main()
    print(f"End: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")

Console log:

python cfgrib_memory.py 
Start: 110.91796875 MiB
Before opening file: 111.44140625 MiB
After opening file: 239.69140625 MiB
Filename: cfgrib_memory.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     7    111.4 MiB    111.4 MiB           1   @profile
     8                                         def main():
     9    111.4 MiB      0.0 MiB           1       path = 'nam.t00z.afwaca36.tm00.grib2'
    10    111.4 MiB      0.0 MiB           1       print(f"Before opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")
    11    239.7 MiB    128.2 MiB           1       ds = cfgrib.open_datasets(path)
    12    239.7 MiB      0.0 MiB           1       del ds
    13    239.7 MiB      0.0 MiB           1       print(f"After opening file: {psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2} MiB")


End: 239.69140625 MiB

I'm using cfgrib's v0.9.10.2 though I checked with v0.9.10.3, I'm getting the same result.

Thanks @mahrsee1997, I can confirm the behaviour you describe (thanks for the great example). However, I also see the same behaviour when I open a NetCDF file with xarray (ds = xr.open_dataset('large_netcdf.nc')), so I'm not sure this is a cfgrib-specific issue. It may well also be related to Python's garbage collection, although in my version of your example I call gc.collect() after deleting the variable, just to be sure, but it makes no difference. We did spot and (mostly) fix a memory accumulation error that occurred in cfgrib 0.9.10.2 (fixed in 0.9.10.3), but like you, I don't see a difference between the versions for this particular example.

I also see the same behaviour when I open a NetCDF file with xarray (ds = xr.open_dataset('large_netcdf.nc'))

Yes, @iainrussell I too notice the same. Looking forward for some fix.
Also do you want me to raise this issue on xarray's repo ?