Unidata / netcdf-c

Official GitHub repository for netCDF-C libraries and utilities.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

issue when record dimension is not the last dimension of a variable

Alexander-Barth opened this issue · comments

If I create a file (netCDF4) with a record dimension and the record dimension is not the last dimension of a variable, loading the whole variable can lead to incorrect results.
Reading the whole variable and then making a slice leads to different results to reading directly a slice.
The file below was generated in julia with the NetCDF4 API.

  • the version of the software with which you are encountering an issue

netCDF 4.9.0 and 4.9.2

  • environmental information (i.e. Operating System, compiler info, java version, python version, etc.)

Ubuntu 22.04.3, julia 1.9.0 and python 3.10.2

  • a description of the issue with the steps needed to reproduce it

NetCDF file is available at:
https://dox.ulg.ac.be/index.php/s/H0ycqUFOaySCCbi/download

One would expect that the two read operation would match:

import netCDF4    
ds = netCDF4.Dataset("foo.nc");
ds["sample"][:,350,0,0]                                                                                                                                                                           
# masked_array(data=[1, 2, 3],
#             mask=False,
#       fill_value=999999,
#            dtype=int32)
ds["sample"][:][:,350,0,0]                                                                                                                                                                        
# masked_array(data=[1, --, --],
#              mask=[False,  True,  True],
#        fill_value=-2147483647,
#            dtype=int32)

Note the first element is correct (1). The the following should be 2 and 3.
I have exactly the same problem in Julia, but I reproduced it in python because I assume that you are more familiar with it.

The problem disappears if the record dimension is the last one or if I use fixed dimensions.
I am wondering if the problem could be in netcdf.

For reference here is the julia code to create this file:


using NCDatasets

fname_cv_out = "/tmp/foo.nc"

rm(fname_cv_out)
dsout = NCDataset(fname_cv_out,"c")

Nsample = 3

# Dimensions

dsout.dim["lon"] = 1
dsout.dim["lat"] = 1
dsout.dim["time"] = Inf # important
#dsout.dim["time"] = 365 # ok
dsout.dim["sample"] = Nsample


nctime = defVar(dsout,"time", Float64, ("time",))
ncdatasample = defVar(dsout,"sample", Int32, ("lon", "lat", "time", "sample"))

nctime[1:365] = 1:365 # important

n = 351
xc = reshape(1:3,(1,1,3))
ncdatasample[:,:,n,:] = xc[:,:,:]

close(dsout)

ds = NCDataset(fname_cv_out)
@show ds["sample"][:,:,:,:][1,1,351,1:3]
@show ds["sample"][1,1,351,1:3]
close(ds)

Note that large part of the file are uninitialized but the slice that is read should have the values 1, 2 and 3.

$ ncdump -h foo.nc 
netcdf foo {
dimensions:
	lon = 1 ;
	lat = 1 ;
	time = UNLIMITED ; // (365 currently)
	sample = 3 ;
variables:
	double time(time) ;
	int sample(sample, time, lat, lon) ;
}

I am a bit confused. In the second python program, it appears that sample is defined as:
int sample[lon,lat,time,sample]
but in the ncdump output and in the first python program we appear to have:
int sample[sample,time,lat,lon]
In other words, the dimension order is reversed between read and write. What am I missing?

Actually the second program (with defVar(dsout,"sample", Int32, ("lon", "lat", "time", "sample"))) is a julia program which is column-major, like Fortran, R, ... but unlike python and C/ncdump.