NCAR / ncl

The NCAR Command Language (NCL) is a scripting language for the analysis and visualization of climate and weather data.

Home Page:http://www.ncl.ucar.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lagged auto-correlations of random_normal not nul

yruprich opened this issue · comments

Description of the bug

Hi, I believe there is a shortcoming with the function random_normal.
By generating vectors of T elements with this function, I find that in average the lagged auto-correlation of those vectors in not 0 at lags different than 0. The auto-correlation value tends to -1/(T-1).

Example:

N     = toint(10^7)
T     = 100
invT  = -1./(T-1.)
sd    = 1
av    = 0
mxlag = 10

random_setallseed(1,1) ; (36484749, 9494848)  
X = random_normal(av,sd,(/N,T/))

acf = esacr(X,mxlag)

print("mean auto-correlation of random_normal vector of length T="+T+": "+dim_avg_n_Wrap(acf,0))
print("to be compared with -1/(T-1) = "+invT)

Computing environment

I have this problem in all the 3 environments I tried:

  1. Linux, Ubuntu 20.04.4 LTS, NCL 6.6.2, installed with apt install ncl-ncarg
  2. Linux, OpenSUSE Leap 42.3, NCL 6.3.0, installed with pre-compiled binaries "version-CentOS7.6_64bit_nodap_gnu485.tar.gz"
  3. Linux, Red Hat Enterprise Linux 8.4, NCL 6.6.2, built from sources

Additional context
The problem I am referring to might seem tiny. However, it leads to larger biases when those vectors are used as seeds to generate auto-regressive time series. This is also problematic in case one uses this function to create bootstrap statistical tests.

Cheers,
Yohan

Actually, I am facing the same problem with Python (v2.7.9 and v3.7.4):

import numpy as np

N     = 10000000
T     = 100
invT  = -1./(T-1.)
sd    = 1
av    = 0
mxlag = 10

X     = np.random.normal(av, sd, size=(N, T))
acf   = X[:,0:mxlag+1]
for i in range(N):
    acf[i,:] = [1. if l==0 else np.corrcoef(X[i,l:],X[i,:-l])[0][1] for l in range(mxlag+1)]

acf_mean=np.average(acf, axis=0)

print('mean auto-correlation of random_normal vector of length T=',T,' : ',acf_mean)
print('to be compared with -1/(T-1) = ',invT)

Actually this is not a shortcoming of the NCL function. My problem is coming from the bias in the estimate of the auto-correlation. This has been already documented back in 1954...

Reference: Marriott, F. H. C., and J. A. Pope. "Bias in the estimation of autocorrelations." Biometrika 41.3/4 (1954): 390-402 (https://www.jstor.org/stable/2332719)