realfastvla / rfpipe

Fast radio interferometric transient search pipeline

Home Page:https://realfastvla.github.io/rfpipe

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Very low SNR (<7) candidates being clustered and plotted

KshitijAggarwal opened this issue · comments

Pipeline is generating and clustering candidates with SNR less than the threshold (persistent on all the cases: injected transient on simulated data, frb with real data, and real data with injected transient).
Could be an issue of calculation of SNR on GPU vs CPU, cause this was encountered using rfgpu.
An example of the above:
t0 = time.Time.now().mjd
meta = rfpipe.metadata.mock_metadata(t0,t0+10/(24*3600), 27, 4, 32*4, 4, 5e3, datasource='sim', antconfig='D')
st = rfpipe.state.State(inmeta = meta, inprefs={'dtarr': [1, 2], 'npix_max': 512, 'savecands': True, 'saveplots': True, 'maxdm': 500, 'applyonlineflags': False, 'flaglist': [], 'clustercands': True, 'fftmode': 'cuda'})
data = rfpipe.source.read_segment(st,0)
mock = rfpipe.util.make_transient_params(st, snr = 80, data = data, ntr = 3)
st = rfpipe.state.State(inmeta = meta, inprefs={'dtarr': [1, 2, 4], 'npix_max': 512, 'savecands': True, 'saveplots' : True,'maxdm': 500, 'applyonlineflags': False, 'flaglist': [], 'clustercands': (4,2), 'simulated_transient': mock, 'fftmode': 'cuda'})
cc = rfpipe.pipeline.pipeline_scan(st)

the mocks used were:
[(0, 971, 77.4, 0.005, 0.18785936515144086, -0.0016023057774304935, 0.0), (0, 21, 129.0, 0.02, 0.09468444540203508, 0.0025197170973120567, 0.0), (0, 1859, 336.05, 0.005, 0.1892371962894011, 0.0, -0.0029736458931088282)]

cands_test_58419 88112078935 1 1_seg0-i466-dm3-dt1

Also, for the data 16A-459_TEST_1hr_000.57633.66130137732.scan7.cut the max SNR reported for the clustering, and the one on the plots was different. Relevant logger output attached.
As can be seen below, for the cluster with ~1300 cands, it reports a max SNR of 31.578176498413086, but while reproducing the candidate it was reported to be 22.6.

screen shot 2018-10-28 at 4 53 48 pm

Can you confirm the rfpipe version is the latest one (1.1.5)?

By the way, I have confirmed this bug. I thought I fixed, it but will need to do more work.

Ok, I've implemented a few changes that help with this. I'm still seeing small SNR differences between the first detection (fast loop) and the reproduction loop (after clustering).
You should only see these very low SNR candidates near true candidates and with fftmode="cuda". The GPU gridding is seeing some real candidate, but the CPU isn't doing things quite the same way.

Could you run your tests again and tell me if things perform better?

I ran a few of the test cases again, and I don't see any very low SNR (~4) candidates now, but as you mentioned the SNR reported by GPU and that computed by CPU are different, and there is no apparent pattern in it. In some cases its more, is some cases its less, and the magnitude of change and the ratio of the two seems random.

I am narrowing down the cause of this problem to the image formation. I can see that a mock transient produces an image SNR that differs between rfgpu and rfpipe (CPU/cython) even if the transient has DM=0.
I am now playing with rfgpu gridding and comparing images made with the CPU/cython code. The pixel values differ between two image, although the peak SNR is similar when a transient is present.
I'm not sure how to debug in detail, but I'll keep playing. Any suggestions @demorest ?

I have fixed this issue.
My CPU code was not rounding the uv coordinates during gridding when using a single core, but rfgpu does round.
I was confused for a while, because I remember that my CPU code did reproduce the GPU SNR values. I was probably remembering results from my multicore version of the gridding algorithm, which is identical to the rfgpu implementation.
Anyway, both CPU versions are now rounding, just as rfgpu does.