issue when record dimension is not the last dimension of a variable

Question

issue when record dimension is not the last dimension of a variable

Alexander-Barth opened this issue 7 months ago · comments

If I create a file (netCDF4) with a record dimension and the record dimension is not the last dimension of a variable, loading the whole variable can lead to incorrect results.
Reading the whole variable and then making a slice leads to different results to reading directly a slice.
The file below was generated in julia with the NetCDF4 API.

the version of the software with which you are encountering an issue

netCDF 4.9.0 and 4.9.2

environmental information (i.e. Operating System, compiler info, java version, python version, etc.)

Ubuntu 22.04.3, julia 1.9.0 and python 3.10.2

a description of the issue with the steps needed to reproduce it

NetCDF file is available at:
https://dox.ulg.ac.be/index.php/s/H0ycqUFOaySCCbi/download

One would expect that the two read operation would match:

import netCDF4    
ds = netCDF4.Dataset("foo.nc");
ds["sample"][:,350,0,0]                                                                                                                                                                           
# masked_array(data=[1, 2, 3],
#             mask=False,
#       fill_value=999999,
#            dtype=int32)
ds["sample"][:][:,350,0,0]                                                                                                                                                                        
# masked_array(data=[1, --, --],
#              mask=[False,  True,  True],
#        fill_value=-2147483647,
#            dtype=int32)

Note the first element is correct (1). The the following should be 2 and 3.
I have exactly the same problem in Julia, but I reproduced it in python because I assume that you are more familiar with it.

The problem disappears if the record dimension is the last one or if I use fixed dimensions.
I am wondering if the problem could be in netcdf.

For reference here is the julia code to create this file:


using NCDatasets

fname_cv_out = "/tmp/foo.nc"

rm(fname_cv_out)
dsout = NCDataset(fname_cv_out,"c")

Nsample = 3

# Dimensions

dsout.dim["lon"] = 1
dsout.dim["lat"] = 1
dsout.dim["time"] = Inf # important
#dsout.dim["time"] = 365 # ok
dsout.dim["sample"] = Nsample


nctime = defVar(dsout,"time", Float64, ("time",))
ncdatasample = defVar(dsout,"sample", Int32, ("lon", "lat", "time", "sample"))

nctime[1:365] = 1:365 # important

n = 351
xc = reshape(1:3,(1,1,3))
ncdatasample[:,:,n,:] = xc[:,:,:]

close(dsout)

ds = NCDataset(fname_cv_out)
@show ds["sample"][:,:,:,:][1,1,351,1:3]
@show ds["sample"][1,1,351,1:3]
close(ds)

Note that large part of the file are uninitialized but the slice that is read should have the values 1, 2 and 3.

$ ncdump -h foo.nc 
netcdf foo {
dimensions:
	lon = 1 ;
	lat = 1 ;
	time = UNLIMITED ; // (365 currently)
	sample = 3 ;
variables:
	double time(time) ;
	int sample(sample, time, lat, lon) ;
}

Dennis Heimbigner · Answer 1 · Mon Dec 18 2023 04:05:59 GMT+0800 (China Standard Time)

I am a bit confused. In the second python program, it appears that sample is defined as:
int sample[lon,lat,time,sample]
but in the ncdump output and in the first python program we appear to have:
int sample[sample,time,lat,lon]
In other words, the dimension order is reversed between read and write. What am I missing?

Alexander Barth · Answer 2 · Mon Dec 18 2023 04:31:04 GMT+0800 (China Standard Time)

Actually the second program (with defVar(dsout,"sample", Int32, ("lon", "lat", "time", "sample"))) is a julia program which is column-major, like Fortran, R, ... but unlike python and C/ncdump.