Very low SNR (<7) candidates being clustered and plotted

Question

Very low SNR (<7) candidates being clustered and plotted

KshitijAggarwal opened this issue 6 years ago · comments

Pipeline is generating and clustering candidates with SNR less than the threshold (persistent on all the cases: injected transient on simulated data, frb with real data, and real data with injected transient).
Could be an issue of calculation of SNR on GPU vs CPU, cause this was encountered using rfgpu.
An example of the above:
t0 = time.Time.now().mjd
meta = rfpipe.metadata.mock_metadata(t0,t0+10/(24*3600), 27, 4, 32*4, 4, 5e3, datasource='sim', antconfig='D')
st = rfpipe.state.State(inmeta = meta, inprefs={'dtarr': [1, 2], 'npix_max': 512, 'savecands': True, 'saveplots': True, 'maxdm': 500, 'applyonlineflags': False, 'flaglist': [], 'clustercands': True, 'fftmode': 'cuda'})
data = rfpipe.source.read_segment(st,0)
mock = rfpipe.util.make_transient_params(st, snr = 80, data = data, ntr = 3)
st = rfpipe.state.State(inmeta = meta, inprefs={'dtarr': [1, 2, 4], 'npix_max': 512, 'savecands': True, 'saveplots' : True,'maxdm': 500, 'applyonlineflags': False, 'flaglist': [], 'clustercands': (4,2), 'simulated_transient': mock, 'fftmode': 'cuda'})
cc = rfpipe.pipeline.pipeline_scan(st)

the mocks used were:
[(0, 971, 77.4, 0.005, 0.18785936515144086, -0.0016023057774304935, 0.0), (0, 21, 129.0, 0.02, 0.09468444540203508, 0.0025197170973120567, 0.0), (0, 1859, 336.05, 0.005, 0.1892371962894011, 0.0, -0.0029736458931088282)]

Also, for the data 16A-459_TEST_1hr_000.57633.66130137732.scan7.cut the max SNR reported for the clustering, and the one on the plots was different. Relevant logger output attached.
As can be seen below, for the cluster with ~1300 cands, it reports a max SNR of 31.578176498413086, but while reproducing the candidate it was reported to be 22.6.

Casey Law · Answer 1 · Fri Nov 02 2018 07:02:14 GMT+0800 (China Standard Time)

Can you confirm the rfpipe version is the latest one (1.1.5)?

Casey Law · Answer 2 · Fri Nov 02 2018 07:12:28 GMT+0800 (China Standard Time)

By the way, I have confirmed this bug. I thought I fixed, it but will need to do more work.

Casey Law · Answer 3 · Sat Nov 03 2018 05:55:49 GMT+0800 (China Standard Time)

Ok, I've implemented a few changes that help with this. I'm still seeing small SNR differences between the first detection (fast loop) and the reproduction loop (after clustering).
You should only see these very low SNR candidates near true candidates and with fftmode="cuda". The GPU gridding is seeing some real candidate, but the CPU isn't doing things quite the same way.

Could you run your tests again and tell me if things perform better?

Kshitij Aggarwal · Answer 4 · Mon Nov 05 2018 07:26:07 GMT+0800 (China Standard Time)

I ran a few of the test cases again, and I don't see any very low SNR (~4) candidates now, but as you mentioned the SNR reported by GPU and that computed by CPU are different, and there is no apparent pattern in it. In some cases its more, is some cases its less, and the magnitude of change and the ratio of the two seems random.

Casey Law · Answer 5 · Fri Nov 09 2018 07:32:38 GMT+0800 (China Standard Time)

I am narrowing down the cause of this problem to the image formation. I can see that a mock transient produces an image SNR that differs between rfgpu and rfpipe (CPU/cython) even if the transient has DM=0.
I am now playing with rfgpu gridding and comparing images made with the CPU/cython code. The pixel values differ between two image, although the peak SNR is similar when a transient is present.
I'm not sure how to debug in detail, but I'll keep playing. Any suggestions @demorest ?

Casey Law · Answer 6 · Fri Nov 09 2018 08:38:59 GMT+0800 (China Standard Time)

I have fixed this issue.
My CPU code was not rounding the uv coordinates during gridding when using a single core, but rfgpu does round.
I was confused for a while, because I remember that my CPU code did reproduce the GPU SNR values. I was probably remembering results from my multicore version of the gridding algorithm, which is identical to the rfgpu implementation.
Anyway, both CPU versions are now rounding, just as rfgpu does.