CDAT / cdms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cdscan issues with poorly formed netcdf files

durack1 opened this issue · comments

This redirects the issue described in pochedls/xagg#33

cdscan is having problems with poorly formed netcdf files. These files contain valid data but have been poorly defined, for e.g. a time fixed field (no time dimension) that includes a time dimension that has no values. For the example below, cdms2 can read the areacello variable from the file, but cdscan throws an error. For comparison, a valid file ncdump is included at the bottom of this issue.

$ ncdump -ct ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn.nc
netcdf areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn {
dimensions:
	axis_nbounds = 2 ;
	x = 362 ;
	y = 294 ;
	nvertex = 4 ;
	time = UNLIMITED ; // (0 currently)
variables:
	double lat(y, x) ;
		lat:standard_name = "latitude" ;
		lat:long_name = "Latitude" ;
...
	double lon(y, x) ;
		lon:standard_name = "longitude" ;
		lon:long_name = "Longitude" ;
...
	double bounds_lon(y, x, nvertex) ;
	double bounds_lat(y, x, nvertex) ;
	float areacello(y, x) ;
		areacello:standard_name = "cell_area" ;
		areacello:long_name = "Grid-Cell Area" ;
		areacello:units = "m2" ;
...
		areacello:history = "none" ;

// global attributes:
...

To Reproduce
Steps to reproduce the behavior:

  1. Install CDAT 8.2.1 nompi
  2. Attempt to run cdscan on the file listed above
(cdat821nompi) bash-4.2$ cdscan -x tmp.xml ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn.nc
Finding common directory ...
Common directory: ~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/
Scanning files ...
~/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/piControl/r1i1p1f2/Ofx/areacello/gn/v20180814/areacello_Ofx_CNRM-CM6-1_piControl_r1i1p1f2_gn.nc
Setting reference time units to 
Traceback (most recent call last):
  File "~/anaconda3/envs/cdat821nompi/bin/cdscan", line 1842, in <module>
    main(sys.argv)
  File "~/anaconda3/envs/cdat821nompi/bin/cdscan", line 1284, in main
    timeIsLinear = (referenceTime[0].lower().split() in
IndexError: string index out of range
  1. See cdscan error above

And here is an ncdump of a validly formed file (note no time dimension is defined)

(cdat821nompi) bash-4.2$ ncdump -ct ~/esgf_publish/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/1pctCO2/r1i1p1f1/Ofx/areacello/gn/v20191109/areacello_Ofx_ACCESS-CM2_1pctCO2_r1i1p1f1_gn.nc 
netcdf areacello_Ofx_ACCESS-CM2_1pctCO2_r1i1p1f1_gn {
dimensions:
	j = 300 ;
	i = 360 ;
	bnds = 2 ;
	vertices = 4 ;
variables:
	int j(j) ;
		j:units = "1" ;
		j:long_name = "cell index along second dimension" ;
	int i(i) ;
		i:units = "1" ;
		i:long_name = "cell index along first dimension" ;
	double latitude(j, i) ;
		latitude:standard_name = "latitude" ;
		latitude:long_name = "latitude" ;
...
		latitude:bounds = "vertices_latitude" ;
	double longitude(j, i) ;
		longitude:standard_name = "longitude" ;
		longitude:long_name = "longitude" ;
...
		longitude:bounds = "vertices_longitude" ;
	double vertices_latitude(j, i, vertices) ;
		vertices_latitude:units = "degrees_north" ;
...
	double vertices_longitude(j, i, vertices) ;
		vertices_longitude:units = "degrees_east" ;
...
	float areacello(j, i) ;
		areacello:standard_name = "cell_area" ;
		areacello:long_name = "Grid-Cell Area for Ocean Variables" ;
		areacello:comment = "Horizontal area of ocean grid cells" ;
		areacello:units = "m2" ;
...

// global attributes: