realfastvla / rfgpu

GPU-based gridding and imaging library for realfast

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

add median absolute deviation to image stats

caseyjlaw opened this issue · comments

MAD is a nice outlier-resistant way to calculate variance:
mad = 1.4826*np.median(np.abs(array-np.median(array)))

MAD is not added yet, but I reorganized the image statistics code somewhat. I think the only differences when calling from python are: 1. you need to call Image.add_stat() first to set up the statistics you want computed. and 2. Image.stats() returns a dict rather than a list so it's more obvious which value is which. The example script is updated to show the new usage.

That's nice. I've confirmed that it works for me.

A quick investigation into MAD is not looking very good so far. There seem to be surprisingly few implementations of median-finding in the standard GPU libraries. There are easily available sort routines, so I tried the simple "sort then take the mid-point" approach. Doing a single sort of 1M points (ie, a 1k-by-1k image) takes over 0.5ms, which is ~1.5x all the rest of it (gridding+fft+misc) put together. And MAD would technically take two sorts, so this would slow things down by a factor of ~4. Is that worth it?

There are also histogram routines available. I haven't tried these yet but I think it's likely they will be faster than sorting the whole image (which probably overkill for our purposes). Will put some more results when I try them..

That is surprisingly bad. At that cost, I'd say we forget about it. The standard deviation is generally pretty good and I haven't seen any actual problems as a result of using it.
Thanks for checking into it.

FYI, I checked in my sort-based implementation of interquartile range (IQR), which is not exactly MAD but it's a similar idea. But as noted above it's really slow so you probably don't want to use it in the real pipeline.