realfastvla / rfgpu

GPU-based gridding and imaging library for realfast

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

implement peak pixel calculation and fringe rotation

caseyjlaw opened this issue · comments

Two ideas for features to implement on the GPU (but not a high priority...):
There are two scenarios in which we are limited by the need to pull the rfgpu image back to the host. The basic pattern for now is:

img_grid = rfgpu.GPUArrayReal((st.npixx, st.npixy))
<search over DM>
for i in peak_candidate_integration:
    img_grid.d2h()
    l, m = <max of img_grid at i>
    phase_shift(data, l, m)
    ...

So each candidate triggers pulling an image back, which can be slow. If peak pixel detection and phasing were implemented on the GPU, it would remove this bottleneck. This slows down the pipeline down when many candidates are detected, so this is more of a nice feature than a critical limit.
A second benefit is that if we could phase shift on the GPU, then we could actually improve our sensitivity to candidates with complex spectral structure. I recently implemented a Kalman significance calculation on the GPU side and can show that we gain up to 40% in sensitivity for spectra with structure. The input only requires a 1-d spectrum of a candidate. If we want to implement a Kalman calculation for each GPU-detected candidate, it would be much faster to phase on the GPU and pull the spectrum back for the Kalman significance calculation.

As requested, a bit more info on how this is implemented in rfpipe...

Peak pixel to l, m:
https://github.com/realfastvla/rfpipe/blob/master/rfpipe/state.py#L632

Phasing:
https://github.com/realfastvla/rfpipe/blob/master/rfpipe/util.py#L29

A quick update on a plan that Kshitij and I are implementing...
We're restructuring the search loop to make two passes. The first pass returns simple candidate properties like integration, DM, and peak pixel. We then run a clustering analysis on those to define a second set of candidates that need to be reanalyzed in depth.
To make this work on the GPU side, we would need to add two statistics. I'm imagining it in this way:

image.add_stat('peak_l')
image.add_stat('peak_m')
l = stats['peak_l']
m = stats['peak_m']

If you have any immediate concerns, let me know. We can roll out the GPU implementation later, but the structure will rely on this statistic being in place eventually.

That all sounds good to me. Do you think you are going to want peak (l,m) returned for every trial? Or only those that have a peak above some threshold? That might affect how I'd implement it in the GPU.

I was thinking of this as something calculated at the same time as the 'max' statistic, since they are associated. And currently there is no sense of a threshold in the rfgpu code, right?
I'm open to suggestions though.

Looks like there is an argmax function in the gpu library I'm using that could be easily used instead of the simple max I have there now. So should be an easy change. I don't know how the performance will compare, we'll have to test to see if there is much difference (I'd guess not but who knows, GPU stuff is weird sometimes). If this does cause a big performance loss, we could think about only retrieving max pixel for candidates above some threshold. Unless you need the sub-threshold values for the clustering?

I'm not sure what you mean by "no sense of a threshold" ... could you explain a bit more?

Yes, an argmax would be perfect. But the max pixel value is also still useful. Would it be ok to just add the argmax as another statistic?
Regarding thresholding, I just meant that rfgpu currently does not concern itself with a threshold condition that triggers further operations. That sounds like a pretty big redesign.

Ok, I added the peak pixel calculation. You enable it by running image.add_stat('pix'). This will result in two new values returned in the stats dict, with keys 'xpeak' and 'ypeak'. It will also compute the 'max' value "for free" so if you are using peak pixel you should not use image.add_stat('max') (nothing very bad will happen, other than the GPU spending time double-computing the max value). This usage is now illustrated in rfgpu_example.py also.

The units for xpeak and ypeak are offset in number of pixels. So xpeak=ypeak=0 is the image center, and negative values are allowed. You'll need to convert these to (l,m) on your own (the Image class doesn't know anything about physical units).

Also meant to mention that for small image sizes (512-by-512 and under), the peak pixel calculation is about 50% slower than the original max pixel value, but this is a pretty small fraction of the total compute time so probably not a big deal. As with other small-image steps I think this is dominated by function call or kernel launch overheads and could be improved if needed. For larger image sizes both versions take about the same amount of time.

Thanks, Paul. I agree that the extra compute time is probably ok.
I'll probably wait for the power to the CBE to return before implementing this.

OK. Although last I heard it sounded like it wouldn't be til early next week due to some issues with the generator. The rfnodes do have power.

I'm working on phasing, will let you know when that's ready.

About the thresholding you mentioned earlier in the thread, I don't know that it would necessarily involve a big redesign (in the sense of re-writing existing functionality). Depends on what exactly you have in mind. If there is a feature that would be useful, let me know and we can consider how much work it would be.

I have the new peak pixel code implemented now. Looking good!
And don't worry about my comment about thresholding. I think it was a misunderstanding of what you were doing.

I have a confusing issue with the ypeak value. If I simulate a transient with l!=0, the image xpeak is correct, but ypeak is also nonzero. If I simulate a transient with m!=0, then image xpeak and ypeak are both correct (that is, xpeak is zero, as expected). I'm not sure why ypeak would be special.
More likely is that there's some simple confusion of x/y that makes this happen. Can you remind me what rfgpu thinks x and y are in comparison to numpy?

I think the issue is with non-square images. If I simulated transients and search with square images, the issue goes away. Could squareness be an assumption of the peak pixel calc?
Here's my evidence:

  • Adding a transient with nonzero l and m=0.
  • Detected with xpeak correct at pixel 177, but incorrect (nonzero) ypeak at pixel 456.
  • The image size is 972x1024.
  • The x axis is 1024-972=52 pixels less than square.
  • np.mod(52*177, 972) = 456.

I can reproduce this for other simulated transients. For square images this kind of counting error would disappear.

Ok, that should be fixed now. There was a typo in the ypeak calculation that as you reported had no effect in the xpix=ypix case.

I've confirmed that this bug is fixed. Thanks!

I'm going to close this one and open a new issue for phasing / dynamic spectra.