realfastvla / rfgpu

GPU-based gridding and imaging library for realfast

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

downsample run twice?

caseyjlaw opened this issue · comments

Kshitij found an issue with an apparent duplication of detections in GPU code that is not present on the CPU. It is reproducible, as shown at realfastvla/rfpipe#45. It is only reproducible when the size of the visibility data is modestly large (>1 GB with 1000 integrations, 351 baselines, 128 chans, 2 pols).

An example, for a simulated transient at integration 1923, downsampling once:

INFO:rfpipe.search:Imaging 2000 ints (0-1999) in seg 0 at DM/dt 0.0/1 with image 144x162 (uvres 83) on GPU 1
INFO:rfpipe.search:Got one! SNR1 24.7 candidate at (0, 1923, 0, 0, 0) and (l, m) = (-0.00159, 0.00000)
INFO:rfpipe.search:Imaging 1000 ints (0-999) in seg 0 at DM/dt 0.0/2 with image 144x162 (uvres 83) on GPU 1
INFO:rfpipe.search:Got one! SNR1 18.9 candidate at (0, 480, 0, 1, 0) and (l, m) = (-0.00159, 0.00000)
INFO:rfpipe.search:Got one! SNR1 22.2 candidate at (0, 961, 0, 1, 0) and (l, m) = (-0.00159, 0.00000)

The candidate should only be found at 1923 and 1923//2 after downsampling. However, after downsampling once, a candidate is found at those and at integration 1924//4.

I see the downsample function (https://github.com/realfastvla/rfgpu/blob/master/src/grid.cu#L219) has a note about potential issues when the number of integrations is not divisible by 2. I'm also wondering if there might be an issue with integers wrapping (nbl*nint*nchan could be large).

FWIW, the rfpipe code calling downsample is at https://github.com/realfastvla/rfpipe/blob/development/rfpipe/search.py#L133.

I have confirmed that it only calls it one time.

Yes, you're right. It's not a integer overflow, just a simple bug in the loop iteration within the downsample function. It only becomes apparent if ntime>512.

I can put in a fix for this in a minute, but not sure which branch this should go on. It doesn't really have anything to do with multi-gpu development. Are you still using the old master branch for anything? If not, I might merge multi-gpu back into master first. Even though we're still working on the multi-gpu functionality I think it should all be backwards-compatible with the older version. Sound OK?

Sounds good. I am not using the master branch, so feel free to merge and commit.

OK, I think this fixed in the latest commit on master (5cee20f). Can you guys confirm? Thanks for finding this one.

I can confirm it is fixed.