issue when record dimension is not the last dimension of a variable
Alexander-Barth opened this issue · comments
If I create a file (netCDF4) with a record dimension and the record dimension is not the last dimension of a variable, loading the whole variable can lead to incorrect results.
Reading the whole variable and then making a slice leads to different results to reading directly a slice.
The file below was generated in julia with the NetCDF4 API.
- the version of the software with which you are encountering an issue
netCDF 4.9.0 and 4.9.2
- environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
Ubuntu 22.04.3, julia 1.9.0 and python 3.10.2
- a description of the issue with the steps needed to reproduce it
NetCDF file is available at:
https://dox.ulg.ac.be/index.php/s/H0ycqUFOaySCCbi/download
One would expect that the two read operation would match:
import netCDF4
ds = netCDF4.Dataset("foo.nc");
ds["sample"][:,350,0,0]
# masked_array(data=[1, 2, 3],
# mask=False,
# fill_value=999999,
# dtype=int32)
ds["sample"][:][:,350,0,0]
# masked_array(data=[1, --, --],
# mask=[False, True, True],
# fill_value=-2147483647,
# dtype=int32)
Note the first element is correct (1). The the following should be 2 and 3.
I have exactly the same problem in Julia, but I reproduced it in python because I assume that you are more familiar with it.
The problem disappears if the record dimension is the last one or if I use fixed dimensions.
I am wondering if the problem could be in netcdf.
For reference here is the julia code to create this file:
using NCDatasets
fname_cv_out = "/tmp/foo.nc"
rm(fname_cv_out)
dsout = NCDataset(fname_cv_out,"c")
Nsample = 3
# Dimensions
dsout.dim["lon"] = 1
dsout.dim["lat"] = 1
dsout.dim["time"] = Inf # important
#dsout.dim["time"] = 365 # ok
dsout.dim["sample"] = Nsample
nctime = defVar(dsout,"time", Float64, ("time",))
ncdatasample = defVar(dsout,"sample", Int32, ("lon", "lat", "time", "sample"))
nctime[1:365] = 1:365 # important
n = 351
xc = reshape(1:3,(1,1,3))
ncdatasample[:,:,n,:] = xc[:,:,:]
close(dsout)
ds = NCDataset(fname_cv_out)
@show ds["sample"][:,:,:,:][1,1,351,1:3]
@show ds["sample"][1,1,351,1:3]
close(ds)
Note that large part of the file are uninitialized but the slice that is read should have the values 1, 2 and 3.
$ ncdump -h foo.nc
netcdf foo {
dimensions:
lon = 1 ;
lat = 1 ;
time = UNLIMITED ; // (365 currently)
sample = 3 ;
variables:
double time(time) ;
int sample(sample, time, lat, lon) ;
}
I am a bit confused. In the second python program, it appears that sample is defined as:
int sample[lon,lat,time,sample]
but in the ncdump output and in the first python program we appear to have:
int sample[sample,time,lat,lon]
In other words, the dimension order is reversed between read and write. What am I missing?
Actually the second program (with defVar(dsout,"sample", Int32, ("lon", "lat", "time", "sample"))
) is a julia program which is column-major, like Fortran, R, ... but unlike python and C/ncdump.