support resampling during gridding

Question

support resampling during gridding

caseyjlaw opened this issue 7 years ago · comments

The rfpipe search iterates over a range of pulse widths. We've traditionally defined this as an array of resampling sizes, dtarr=[1, 2, 4, 8].
It would be nice to have the rfgpu gridding function take a resampling width as an argument.
It should be fine to set it once per function call, as is done for with set_delay.

Paul Demorest · Answer 1 · Sat Jan 13 2018 23:00:30 GMT+0800 (China Standard Time)

I think it will probably be faster to add a GPU function that can downsample the data array (ie average adjacent time samples) by a factor of 2. Then the downsampled data can be re-run through the existing gridding routine. Do you see any problems with that approach?

Paul Demorest · Answer 2 · Sun Jan 14 2018 03:04:17 GMT+0800 (China Standard Time)

The downsample function was pretty easy so I put it in. See example script for usage, and let me know if you see any problems with it..

Casey Law · Answer 3 · Sun Jan 14 2018 03:35:22 GMT+0800 (China Standard Time)

Ok, thanks. I can work with that.
Is it optimal to downsample before dedispersing? I have been dedispersing before downsampling, because I thought it was improved inter-DM sensitivity loss. I haven't worked it out analytically, so my intuition may be off.

Casey Law · Answer 4 · Sun Jan 14 2018 06:37:53 GMT+0800 (China Standard Time)

Working well now!
Using vyssim, I can make a 60-second test scan with 25ms samples (B array, 16 spw, dual pol) and find simulated transients in an FRB-like search. With 2 GPUs, it processes 60 seconds of data in about 20 minutes. That is just about what we claimed in the proposal!

Paul Demorest · Answer 5 · Sun Jan 14 2018 22:19:07 GMT+0800 (China Standard Time)

That's great! How do those numbers compare to the rough GPU times I sent around a while back?

About the possible sensitivity loss, I was thinking this through. If we do downsampling first, it means we can't apply the dispersion shifts with as fine time resolution as otherwise. I think this effectively results in ~0.5-sample additional pulse broadening, so something like a factor of sqrt(1**2 + 0.5**2) ~ 1.1 worse sensitivity. Does that sound about right to you?

Some of this might be recovered if we keep the same DM step size; that is, the optimal DM step for say 5ms is oversampling DM for 10ms. I haven't really worked through this part though. If we are worried we should do some simulations but I'd be a bit surprised if it was more than a ~10% effect.

Casey Law · Answer 6 · Tue Jan 16 2018 05:55:04 GMT+0800 (China Standard Time)

I ran nvprof to try to reproduce what you sent via email a while back. I can see similar function calls ("spRadix...") with valus that are about 2x faster. Perhaps that is because you are now using a complex-to-real FFT? Anyway, they seem to be in the right ballpark and adds up to about 0.1 ms per 1024^2 fft. That is about 20-30x faster than my fftw code, which is consistent with the speedup I measured in an end-to-end test.

Casey Law · Answer 7 · Tue Jan 16 2018 06:09:54 GMT+0800 (China Standard Time)

Regarding resampling: I'll be doing a careful fftw vs cuda comparison using simulations and real pulses. That'll make it clear if there is something to worry about.

Paul Demorest · Answer 8 · Wed Apr 10 2019 23:12:03 GMT+0800 (China Standard Time)

I don't think this is something we're concerned about any more(?) so closing this one.

Casey Law · Answer 9 · Fri Aug 30 2019 01:33:40 GMT+0800 (China Standard Time)

In fact, we do oversample larger dt values, since we set the DM grid at the integration time.