realfastvla / rfgpu

GPU-based gridding and imaging library for realfast

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

support resampling during gridding

caseyjlaw opened this issue · comments

The rfpipe search iterates over a range of pulse widths. We've traditionally defined this as an array of resampling sizes, dtarr=[1, 2, 4, 8].
It would be nice to have the rfgpu gridding function take a resampling width as an argument.
It should be fine to set it once per function call, as is done for with set_delay.

I think it will probably be faster to add a GPU function that can downsample the data array (ie average adjacent time samples) by a factor of 2. Then the downsampled data can be re-run through the existing gridding routine. Do you see any problems with that approach?

The downsample function was pretty easy so I put it in. See example script for usage, and let me know if you see any problems with it..

Ok, thanks. I can work with that.
Is it optimal to downsample before dedispersing? I have been dedispersing before downsampling, because I thought it was improved inter-DM sensitivity loss. I haven't worked it out analytically, so my intuition may be off.

Working well now!
Using vyssim, I can make a 60-second test scan with 25ms samples (B array, 16 spw, dual pol) and find simulated transients in an FRB-like search. With 2 GPUs, it processes 60 seconds of data in about 20 minutes. That is just about what we claimed in the proposal!

That's great! How do those numbers compare to the rough GPU times I sent around a while back?

About the possible sensitivity loss, I was thinking this through. If we do downsampling first, it means we can't apply the dispersion shifts with as fine time resolution as otherwise. I think this effectively results in ~0.5-sample additional pulse broadening, so something like a factor of sqrt(1**2 + 0.5**2) ~ 1.1 worse sensitivity. Does that sound about right to you?

Some of this might be recovered if we keep the same DM step size; that is, the optimal DM step for say 5ms is oversampling DM for 10ms. I haven't really worked through this part though. If we are worried we should do some simulations but I'd be a bit surprised if it was more than a ~10% effect.

I ran nvprof to try to reproduce what you sent via email a while back. I can see similar function calls ("spRadix...") with valus that are about 2x faster. Perhaps that is because you are now using a complex-to-real FFT? Anyway, they seem to be in the right ballpark and adds up to about 0.1 ms per 1024^2 fft. That is about 20-30x faster than my fftw code, which is consistent with the speedup I measured in an end-to-end test.

Regarding resampling: I'll be doing a careful fftw vs cuda comparison using simulations and real pulses. That'll make it clear if there is something to worry about.

I don't think this is something we're concerned about any more(?) so closing this one.

In fact, we do oversample larger dt values, since we set the DM grid at the integration time.