Serious bug in the taper (and taper_n) function: option=1 overwrites the (1-p) portion of the input array with zero as output

Question

Serious bug in the taper (and taper_n) function: option=1 overwrites the (1-p) portion of the input array with zero as output

kosaka90 opened this issue 3 years ago · comments

Describe the bug
In the fortran code taper.f that is used by the NCL function taper and taper_n, the output array XT is not initialized with the input array X when the option is 1. This makes the returned array XT has correct tapered values in the portion p (e.g., 0.1) of each side of the input array, but all other values are left as zero (or compiler-dependent values) and returned to the user.

When option = 0, XT is initialized to be the same as the input array in line 46 of taper.f

A simple fix will be to change the following part of taper.f

if (kopt.ne.1) then DO I = 1,N XT(I) = X(I) XAV = XAV + X(I) END DO XAV = XAV/N end if

to something like:
if (kopt.ne.1) then DO I = 1,N XT(I) = X(I) XAV = XAV + X(I) END DO XAV = XAV/N else DO I = 1,N XT(I) = X(I) END DO end if

Provide the following:

a concise NCL script demonstrating the issue (remove unnecessary code)

`
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_code.ncl"
load "$NCARG_ROOT/lib/ncarg/nclscripts/csm/gsn_csm.ncl"

N = 100
y = abs( random_normal(0,10, N) )
yt_opt0 = taper(y, 0.1, 0)    ; yt will be tapered toward mean
yt_opt1 = taper(y, 0.1, 1)    ; yt will be tapered toward 0

yavg = avg(y)

yt_opt0_anom = taper(y-yavg, 0.1, 0)    ; yt will be tapered toward mean of zero
yt_opt1_anom = taper(y-yavg, 0.1, 1)    ; yt will be tapered toward 0


;print("================")
;Y  = y - yavg            ; Y will be have zero mean
;YT = taper(Y, 0.1, 0)
;print(Y+"  "+YT)
;YAVG = avg(Y)


wks          = gsn_open_wks ("x11","taper")

res          = True                   ; plot mods desired
res@vpHeightF= 0.4                    ; change aspect ratio of plot
res@vpWidthF = 0.7                  

res@gsnYRefLine  = yavg
res@gsnCenterString = "Original Y"
plot = gsn_csm_y(wks,y ,res) 
res@gsnCenterString = "Tapered Y: toward mean of series"
plot = gsn_csm_y(wks,yt_opt0,res) 
res@gsnCenterString = "Tapered Y: toward zero"
plot = gsn_csm_y(wks,yt_opt1,res) 

delete(res@gsnYRefLine)
res@gsnCenterString = "Original Y with Mean Removed (anomalies)"
plot = gsn_csm_y(wks,y-yavg ,res) 
res@gsnCenterString = "Tapered: Original Y with Mean Removed, option 0"
plot = gsn_csm_y(wks,yt_opt0_anom,res) 
res@gsnCenterString = "Tapered: Original Y with Mean Removed, option 1"
plot = gsn_csm_y(wks,yt_opt1_anom,res)

`

necessary data files
NA
relevant graphical output
Please run the above ncl script.
The following figures are made by my python script that implement taper.f

Computing environment

OS: [Linux, Mac, or Windows]
Linux (NERSC Cori)
OS version: [e.g. Ubuntu 16.04, MacOS 10.13.6, Windows 10]
CLE Linux version 4.12.14-150.69.1.22005.0.PTF.1183816-default
NCL Version: [e.g. 6.5.0]
Checked version 6.4.0 and 6.5.0
Installation method: [binary from NCAR website, conda, other package manager (apt, brew, macports, yum), built from source]
NA

Additional context
This function is an essential part of workflow of Fourier analysis (e.g., https://www.ncl.ucar.edu/Applications/fouranal.shtml, but existing examples all use option = 0 and have not found this bug (e.g., https://www.ncl.ucar.edu/Support/talk_archives/2014/1060.html). Because this bug deletes most of the data given to the taper function [changes (1-p) portion of the data to 0.0], it leads to a serious error in the subsequent Fourier analysis.

Jiwoo Lee · Answer 1 · Tue Oct 17 2023 02:44:41 GMT+0800 (China Standard Time)

I am finding this bug appears regardless of which taper option being used. Either using 0 or 1 option yields the same bug: value of input array not copied to output array of tapering for non-tapering points, outside the tapering of beginning and end side.

NCL script for testing:

begin
  
  datain = new(100, float)

  do i = 0, dimsizes(datain)-1
  datain(i) = 1.0
  end do

  tim_taper_sum = 0.1
  data_tapered = taper( datain, tim_taper_sum, 0 )
  data_tapered_2 = taper( datain, tim_taper_sum, 1 )

  fout_name = "test.nc"
  system("rm -f " + fout_name)
  fout = addfile(fout_name,"c")

  fout->datain = datain
  fout->data_tapered = data_tapered
  fout->data_tapered_2 = data_tapered_2

end

Python script for plotting:

import xarray as xr
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ds = xr.open_dataset("test.nc")
ds.datain.plot(ax=ax, lw=3, label='input')
ds.data_tapered.plot(ax=ax, lw=5, label='tapered (option=0)')
ds.data_tapered_2.plot(ax=ax, lw=2, label='tapered (option=1)')

ax.legend()

fig.savefig('test_plot.png')

Result:

It is however interesting that when input array has varying value, taper with option 0 seems working okay. I am not understanding the logic behind of this...

begin

  N = 100
  datain = abs( random_normal(0,10, N) )

  tim_taper_sum = 0.1
  data_tapered = taper( datain, tim_taper_sum, 0 )
  data_tapered_2 = taper( datain, tim_taper_sum, 1 )

  fout_name = "test.nc"
  system("rm -f " + fout_name)
  fout = addfile(fout_name,"c")

  fout->datain = datain
  fout->data_tapered = data_tapered
  fout->data_tapered_2 = data_tapered_2

end

Versions:
OS version: MacOS 13.6
NCL version: 6.6.2
Installation method: conda