UCBerkeleySETI / turbo_seti

turboSETI -- python based SETI search algorithm.

Home Page:http://turbo-seti.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Coarse channel parameters in turbo_seti need better documentation.

texadactyl opened this issue · comments

There are 2 parameters related to coarse channels in the class DATAHandle init() function:

  • n_coarse_chan : the number of coarse channels or None
  • coarse_chans : a list of coarse channels or None

They are both override strategies in case the caller does not wish to use the blimpy calc_n_coarse_chan() function. Specifying both makes no sense.

Also, init() never validates that when coarse_chan is specified that it is, in fact, a Python list object. Failure to diagnose using a nonlist object would result in some weird type or index error instead of explaining what is actually wrong.

I have never used the coarse channel selection, but can see it is useful for some applications (e.g. finding voyager telemetry at its known frequency, within a huge full-bandwidth filterbank at X-band).

My understanding is that the coarse_chans argument is to set which channels to search, and does not inform on how many channels there are. So the user will often still have to specify both: n_coarse_chan sets how many coarse channels there are in the file, and coarse_chans will set which of these channels to search.

As such I think this is predominantly a documentation issue. #222 seems like a bug though, and agreed that passing a non-list object would result in a weird error.

If one specifies coarse_chans=[1,2,3] (list of length 3), it does not make sense for the user to specify n_coarse_chan=42. Furthermore, downstream code uses both parameters in conflict. So, I have:

  • Inserted some diagnostic code toe make sure that you can specify neither of them (most common) or one of them but not both of them.
  • Made sure that n_coarse_chan is updated to reflext the length of the coarse_chans list when that parameter is used.

Clearer?

The code needs to known n_coarse_chans in order to figure out how many channels there are in each coarse channel, and allocate memory accordingly.

For example:

  • n_chan is number of channels. Let's say there are 4200 channels in the file.
  • n_coarse_chan specifies the number of coarse channels in the file. Let's say this is 42.
  • coarse_chans specifies which channels to search. Let's say this is [1,2,3]

This would search channels 1, 2, and 3 out of 42 total channels (and I think this would actually skip channel 0). If n_coarse_chan is set to 3, then there will be 4200/3 channels in each coarse channel.

Another example that's a bit clearer what can go wrong:

  • n_coarse_chan = 100 -- actual number of coarse channels in the file
  • coarse_chans=[1,5,88] -- length is 3, searching channels 1, 5 and 88. Deriving n_coarse_chans from this would probably cause a crash when it tries to search channel 88.

Hope that makes more sense?

@telegraphic

So, you seem to be saying that coarse_chans, when specified, is a subset selection mechanism.

If the number of coarse channels (n_coarse_chan) is unspecified, then compute it in this order of preference:

  1. Waterfall header ['n_coarse_chan'] if present although I have never seen one in a header.
  2. Waterfall calc_n_coarse_chan() otherwise.

Then, the coarse_chans list for searching forms a subset of the data_list. It is determined this way:

  1. Supplied during DATAHandle object instantiation.
  2. Computed as an evenly spread range from 0 to n_coarse_chan (most common).

The array of coarse channel objects (data_list) to return to the __split_h5() caller (DATAHandle instantiation) is computed from the coarse_chan list.

This exercise of reviewing code occured while updating the documentation. That necessitated reviewing the module and function comments.

Stretch goal: resolve tdwidth, NAXISn, shoulders, multiplication/addition with 0, etc.

I just checked one of Wael's interesting ATA files without and with coarse channel selection.

Without selection, it runs for nearly 8 minutes on my laptop, even with partitioning:
turboSETI -p 4 -l info -m 0.89 -n 256 ./guppi_59196_68468_000762_Unknown_0001.rawspec.0000.h5 | tee wael.log

He and I know that the interesting part is around coarse channel 133 (big SNR spike). So, I did this:
turboSETI -c 131,132,133,134,135 -l info -m 0.89 -n 256 ./guppi_59196_68468_000762_Unknown_0001.rawspec.0000.h5 | tee wael.log
That one ran in 12 seconds when focused and still got the results he was looking for.

Small documentation update in dcp-docs-patch221