Serious bug in the taper (and taper_n) function: option=1 overwrites the (1-p) portion of the input array with zero as output
kosaka90 opened this issue · comments
Describe the bug
In the fortran code taper.f that is used by the NCL function taper and taper_n, the output array XT is not initialized with the input array X when the option is 1. This makes the returned array XT has correct tapered values in the portion p (e.g., 0.1) of each side of the input array, but all other values are left as zero (or compiler-dependent values) and returned to the user.
When option = 0, XT is initialized to be the same as the input array in line 46 of taper.f
A simple fix will be to change the following part of taper.f
if (kopt.ne.1) then DO I = 1,N XT(I) = X(I) XAV = XAV + X(I) END DO XAV = XAV/N end if
to something like:
if (kopt.ne.1) then DO I = 1,N XT(I) = X(I) XAV = XAV + X(I) END DO XAV = XAV/N else DO I = 1,N XT(I) = X(I) END DO end if
Provide the following:
- a concise NCL script demonstrating the issue (remove unnecessary code)
`
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"
N = 100
y = abs( random_normal(0,10, N) )
yt_opt0 = taper(y, 0.1, 0) ; yt will be tapered toward mean
yt_opt1 = taper(y, 0.1, 1) ; yt will be tapered toward 0
yavg = avg(y)
yt_opt0_anom = taper(y-yavg, 0.1, 0) ; yt will be tapered toward mean of zero
yt_opt1_anom = taper(y-yavg, 0.1, 1) ; yt will be tapered toward 0
;print("================")
;Y = y - yavg ; Y will be have zero mean
;YT = taper(Y, 0.1, 0)
;print(Y+" "+YT)
;YAVG = avg(Y)
wks = gsn_open_wks ("x11","taper")
res = True ; plot mods desired
res@vpHeightF= 0.4 ; change aspect ratio of plot
res@vpWidthF = 0.7
res@gsnYRefLine = yavg
res@gsnCenterString = "Original Y"
plot = gsn_csm_y(wks,y ,res)
res@gsnCenterString = "Tapered Y: toward mean of series"
plot = gsn_csm_y(wks,yt_opt0,res)
res@gsnCenterString = "Tapered Y: toward zero"
plot = gsn_csm_y(wks,yt_opt1,res)
delete(res@gsnYRefLine)
res@gsnCenterString = "Original Y with Mean Removed (anomalies)"
plot = gsn_csm_y(wks,y-yavg ,res)
res@gsnCenterString = "Tapered: Original Y with Mean Removed, option 0"
plot = gsn_csm_y(wks,yt_opt0_anom,res)
res@gsnCenterString = "Tapered: Original Y with Mean Removed, option 1"
plot = gsn_csm_y(wks,yt_opt1_anom,res)
`
- necessary data files
NA - relevant graphical output
Please run the above ncl script.
The following figures are made by my python script that implement taper.f
Computing environment
-
OS: [Linux, Mac, or Windows]
Linux (NERSC Cori) -
OS version: [e.g. Ubuntu 16.04, MacOS 10.13.6, Windows 10]
CLE Linux version 4.12.14-150.69.1.22005.0.PTF.1183816-default -
NCL Version: [e.g. 6.5.0]
Checked version 6.4.0 and 6.5.0 -
Installation method: [binary from NCAR website, conda, other package manager (apt, brew, macports, yum), built from source]
NA
Additional context
This function is an essential part of workflow of Fourier analysis (e.g., https://www.ncl.ucar.edu/Applications/fouranal.shtml, but existing examples all use option = 0 and have not found this bug (e.g., https://www.ncl.ucar.edu/Support/talk_archives/2014/1060.html). Because this bug deletes most of the data given to the taper function [changes (1-p) portion of the data to 0.0], it leads to a serious error in the subsequent Fourier analysis.
I am finding this bug appears regardless of which taper option being used. Either using 0 or 1 option yields the same bug: value of input array not copied to output array of tapering for non-tapering points, outside the tapering of beginning and end side.
NCL script for testing:
begin
datain = new(100, float)
do i = 0, dimsizes(datain)-1
datain(i) = 1.0
end do
tim_taper_sum = 0.1
data_tapered = taper( datain, tim_taper_sum, 0 )
data_tapered_2 = taper( datain, tim_taper_sum, 1 )
fout_name = "test.nc"
system("rm -f " + fout_name)
fout = addfile(fout_name,"c")
fout->datain = datain
fout->data_tapered = data_tapered
fout->data_tapered_2 = data_tapered_2
end
Python script for plotting:
import xarray as xr
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ds = xr.open_dataset("test.nc")
ds.datain.plot(ax=ax, lw=3, label='input')
ds.data_tapered.plot(ax=ax, lw=5, label='tapered (option=0)')
ds.data_tapered_2.plot(ax=ax, lw=2, label='tapered (option=1)')
ax.legend()
fig.savefig('test_plot.png')
It is however interesting that when input array has varying value, taper with option 0 seems working okay. I am not understanding the logic behind of this...
begin
N = 100
datain = abs( random_normal(0,10, N) )
tim_taper_sum = 0.1
data_tapered = taper( datain, tim_taper_sum, 0 )
data_tapered_2 = taper( datain, tim_taper_sum, 1 )
fout_name = "test.nc"
system("rm -f " + fout_name)
fout = addfile(fout_name,"c")
fout->datain = datain
fout->data_tapered = data_tapered
fout->data_tapered_2 = data_tapered_2
end
Versions:
OS version: MacOS 13.6
NCL version: 6.6.2
Installation method: conda