CDAT / cdms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems with reading "big" arrays (>8.1Gb)

durack1 opened this issue · comments

Describe the bug
I have hit a reproducible error where big arrays (>8.1Gb) are not read correctly, rather with a zero array (rather than real numbers) being returned. I was a little puzzled by this error, and got talking with @painter1 who also had this problem and reported it back via email in May 2019. It turns out that the issue is with arrays greater than 8.1Gb, with the original error a bug with libnetcdf versions for big variables (from @painter1's notes/emails). @dnadeau4 and @doutriaux1 may recall some of the specific details about this. I note I may not be using the latest versions of libraries below.

To Reproduce
Steps to reproduce the behavior:

  1. Install CDAT with: cdms2-3.1.4-py37ha6f5e91_3, libnetcdf-4.6.2-h303dfb8_1003, netcdf-fortran-4.4.5-h0789656_1004
  2. Execute the code attached (which reads larger and larger arrays)
  3. Watch as some summary stats go from real numbers to 0's when the arrays being read are >8Gb, which for the demo below happens at year 1989 (3rd step of the loop) when 26 years of data are being read (with the model having a vert/horiz grid of 60 vertical levels, 384 lat, 320 lon).

Expected behavior
Big arrays should be read validly, returning non-zero arrays

Desktop (please complete the following information):

  • OS: RHEL7.x

The code to reproduce this:

# imports
import sys
import cdat_info
import cdms2 as cdm
import numpy as np
from socket import gethostname

#%% Define function
def calcAve(var):
    print('type(var);',type(var),'; var.shape:',var.shape)
    # Start querying stat functions
    print('var.min():'.ljust(21),var.min())
    print('var.max():'.ljust(21),var.max())
    print('np.ma.mean(var.data):',np.ma.mean(var.data)) ; # Not mask aware
    # Problem transientVariable.mean() function
    #print('var.mean():'.ljust(21),var.mean())
    print('-----')

#%% Load subset of variable
f = ['/p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc']
# Try building up arrays stepping in a single year
times = np.arange(1991,1984,-1)
print('host:',gethostname())
print('Python version:',sys.version)
print('cdat env:',sys.executable.split('/')[5])
print('cdat version:',cdat_info.version()[0])
print('*****')
for timeSlot in times:
    for filePath in f:
        fH = cdm.open(filePath)
        print('filePath:',filePath.split('/')[-1])
        # Loop through single years
        start = timeSlot ; end = 2014
        print('times:',start,end,'; total years:',(end-start)+1)
        d1 = fH('so',time=(str(start),str(end)))
        print("Array size: %d Mb" % ( (d1.size * d1.itemsize) / (1024*1024) ) )
        calcAve(d1)
        del(d1)
        fH.close()
    print('----- -----')

@pochedls @muryanto1 @downiec @jasonb5 @gabdulla @gleckler1 @lee1043 ping

@durack1 I tried running the code with latest cdms2 in cdat/label/nightly and latest libnetcdf, and was able to reproduce.
`
cdat/label/nightly/linux-64::cdms2-3.1.4.2020.01.14.21.45.gee3f0ff-py37h34d3450_0
libnetcdf 4.7.3 nompi_h9f9fd6a_101 conda-forge
netcdf-fortran 4.5.2 nompi_h09cde99_103 conda-forge

$ curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o Miniconda3-latest-MacOSX-x86_64.sh

$ source miniconda3/etc/profile.d/conda.sh
$ conda activate base
$ conda activate nightly_py3.7

on aims1:
$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O Miniconda3-latest-Linux-x86_64.sh
$ bash ./Miniconda3-latest-Linux-x86_64.sh -b -p miniconda3
$ source miniconda3/etc/profile.d/conda.sh
$ conda activate base
$ conda config --set channel_priority strict
$ conda config --add channel conda-forge
$ conda config --add channels cdat/label/nightly

$ conda create -n nightly_py3.7 cdat mesalib easydev nbsphinx myproxyclient testsrunner coverage pytest "python=3.7" -c cdat/label/nightly -c conda-forge
$ conda activate nightly_py3.7

# I put your code into a file: test_big_array.py
$ python ./test_big_array.py
host: aims1.llnl.gov
Python version: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48) 
[GCC 7.3.0]
cdat env: miniconda3
cdat version: 8
*****
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1991 2014 ; total years: 24
Array size: 7762 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (276, 60, 384, 320)
var.min():            6.940156
var.max():            48.25107
np.ma.mean(var.data): 4.2389736e+19
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1990 2014 ; total years: 25
Array size: 8100 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (288, 60, 384, 320)
var.min():            6.940156
var.max():            48.25107
np.ma.mean(var.data): 4.239067e+19
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1989 2014 ; total years: 26
Array size: 8437 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (300, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1988 2014 ; total years: 27
Array size: 8775 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (312, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1987 2014 ; total years: 28
Array size: 9112 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (324, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1986 2014 ; total years: 29
Array size: 9450 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (336, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----
filePath: so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc
times: 1985 2014 ; total years: 30
Array size: 9787 Mb
type(var); <class 'cdms2.tvariable.TransientVariable'> ; var.shape: (348, 60, 384, 320)
var.min():            0.0
var.max():            0.0
np.ma.mean(var.data): 0.0
-----
----- -----`


@muryanto1 thanks for picking up and reproducing this issue. It'd be helpful to know whether @dnadeau4 or @doutriaux1 had worked on a fix a while ago, and if there are any open issues, branches or commits, or web documentation they can point us to for a resolution

Thanks for documenting and reproducing this issue. I am also hitting this issue. I note that it also occurs at least as far back as CDAT2.10.

@durack1 How was this file created?

@jasonb5 it’s one of the CMIP6 contributed files, NCAR doesn’t use CMOR so not 100% sure what software was used to create it

Folks, just an FYI @jasonb5 determined the issue and found a fix, and @muryanto1 has wrapped this up in the nightly builds - thanks guys!! So for bleeding edge bug fixes come and get it

@durack1 great to know the issue has been resolved. Thank you all for the effort!

@jasonb5 and @muryanto1 - Thank you! For those of us who prefer more stability than the nightly build, is this slated for a release? 8.2.x? 8.3?

@pochedls Yes, but we do not have a time frame yet, but working on it.

Linking PR #389, this will be available in CDAT 8.2.1.