Coarse channel parameters in turbo_seti need better documentation.
texadactyl opened this issue · comments
There are 2 parameters related to coarse channels in the class DATAHandle init() function:
- n_coarse_chan : the number of coarse channels or None
- coarse_chans : a list of coarse channels or None
They are both override strategies in case the caller does not wish to use the blimpy calc_n_coarse_chan() function. Specifying both makes no sense.
Also, init() never validates that when coarse_chan is specified that it is, in fact, a Python list object. Failure to diagnose using a nonlist object would result in some weird type or index error instead of explaining what is actually wrong.
I have never used the coarse channel selection, but can see it is useful for some applications (e.g. finding voyager telemetry at its known frequency, within a huge full-bandwidth filterbank at X-band).
My understanding is that the coarse_chans
argument is to set which channels to search, and does not inform on how many channels there are. So the user will often still have to specify both: n_coarse_chan
sets how many coarse channels there are in the file, and coarse_chans
will set which of these channels to search.
As such I think this is predominantly a documentation issue. #222 seems like a bug though, and agreed that passing a non-list object would result in a weird error.
If one specifies coarse_chans=[1,2,3] (list of length 3), it does not make sense for the user to specify n_coarse_chan=42. Furthermore, downstream code uses both parameters in conflict. So, I have:
- Inserted some diagnostic code toe make sure that you can specify neither of them (most common) or one of them but not both of them.
- Made sure that n_coarse_chan is updated to reflext the length of the coarse_chans list when that parameter is used.
Clearer?
The code needs to known n_coarse_chans
in order to figure out how many channels there are in each coarse channel, and allocate memory accordingly.
For example:
n_chan
is number of channels. Let's say there are 4200 channels in the file.n_coarse_chan
specifies the number of coarse channels in the file. Let's say this is 42.coarse_chans
specifies which channels to search. Let's say this is [1,2,3]
This would search channels 1, 2, and 3 out of 42 total channels (and I think this would actually skip channel 0). If n_coarse_chan
is set to 3, then there will be 4200/3 channels in each coarse channel.
Another example that's a bit clearer what can go wrong:
n_coarse_chan
= 100 -- actual number of coarse channels in the filecoarse_chans=[1,5,88]
-- length is 3, searching channels 1, 5 and 88. Derivingn_coarse_chans
from this would probably cause a crash when it tries to search channel 88.
Hope that makes more sense?
So, you seem to be saying that coarse_chans, when specified, is a subset selection mechanism.
If the number of coarse channels (n_coarse_chan) is unspecified, then compute it in this order of preference:
- Waterfall header ['n_coarse_chan'] if present although I have never seen one in a header.
- Waterfall calc_n_coarse_chan() otherwise.
Then, the coarse_chans list for searching forms a subset of the data_list. It is determined this way:
- Supplied during DATAHandle object instantiation.
- Computed as an evenly spread range from 0 to n_coarse_chan (most common).
The array of coarse channel objects (data_list) to return to the __split_h5() caller (DATAHandle instantiation) is computed from the coarse_chan list.
This exercise of reviewing code occured while updating the documentation. That necessitated reviewing the module and function comments.
Stretch goal: resolve tdwidth, NAXISn, shoulders, multiplication/addition with 0, etc.
I just checked one of Wael's interesting ATA files without and with coarse channel selection.
Without selection, it runs for nearly 8 minutes on my laptop, even with partitioning:
turboSETI -p 4 -l info -m 0.89 -n 256 ./guppi_59196_68468_000762_Unknown_0001.rawspec.0000.h5 | tee wael.log
He and I know that the interesting part is around coarse channel 133 (big SNR spike). So, I did this:
turboSETI -c 131,132,133,134,135 -l info -m 0.89 -n 256 ./guppi_59196_68468_000762_Unknown_0001.rawspec.0000.h5 | tee wael.log
That one ran in 12 seconds when focused and still got the results he was looking for.
Small documentation update in dcp-docs-patch221