plot_event plots the same event multiple times, with the red "guide" line in the wrong place

Question

plot_event plots the same event multiple times, with the red "guide" line in the wrong place

stevecroft opened this issue 3 years ago · comments

In cadences of six files, where a signal is present in the three "ON" observations (A1, A2, A3) and absent in the three "OFF" files (B, C, D), find_event has a tendency to report three separate events, matching A1 to A2 and A3, A2 to A1 and A3, and A3 to A1 and A2, even though this really only constitutes a single event.

For an example, see the output from find_event as run on the single-coarse-channel Voyager 1 files at https://github.com/elanlavie/VoyagerTutorialRepository/blob/master/VoyagerTutorial.ipynb (specifically in Cell 7).

Furthermore, when plotting these events (e.g. the output from plot_event in Cell 8 at the above link), the event associated with A1 is plotted correctly, with the hit from A1 centered on the frequency axis, and the red "guide" line, extrapolating the drift from that signal, overlaid.

In the second plot, where the "key" hit for the event is in observation A2, the relevant hit is centered in frequency in the third panel, but the start position for the red guide line is the frequency for A2, but extrapolated from panel A1. The effect is that the red line gets horizontally offset from its correct position for the hits associated with A2 and A3.

Expected behavior is for turboSETI to return a single event (i.e. a hit can belong to no more than one event) and for this to be keyed on the hit in scan A1, i.e. the six events in Cell 8 in the above notebook would reduce to two, yielding only the first and fourth plots.

Kevin Lacker · Answer 1 · Wed Nov 10 2021 05:59:15 GMT+0800 (China Standard Time)

FWIW I fixed the offset problem for the data pipeline in https://github.com/UCBerkeleySETI/obs_bin/blob/master/pipeline/plot.py but I ended up changing the plotting logic so much in order to be able to make changes, it wasn't clear to me how to backport the fix into turboseti's plot_event. So the offset bug is purely an error in the plotting logic.

I think the triplicate thing is a bug in find_event_pipeline rather than in the plotting logic per se; the output csv has each hit included three times where you really want just one.

Richard Elkins · Answer 2 · Wed Nov 10 2021 07:01:36 GMT+0800 (China Standard Time)

The issue is that eyeballing works great to experienced folks like @stevecroft
Without a find_event.py "map", I need to do a lot of digging.

Richard Elkins · Answer 3 · Thu Nov 11 2021 07:19:15 GMT+0800 (China Standard Time)

The issue is in find_event.py find_events(). Near the end of the filter threshold level 3 processing, it was constructing the event table essentially consisting of all the top hits multiplied by the number of ON dat files.

I have a fix at https://github.com/texadactyl/turbo_seti but I am not yet sure if it is correct, given how challenging to read the original source code is.

Richard Elkins · Answer 4 · Thu Nov 11 2021 15:32:38 GMT+0800 (China Standard Time)

@stevecroft

The tutorial at the main turbo_seti site is complete and allows for substiting a more recent version of turbo_seti. This was originally developed by Shane and subsequently refined.

Danny Price · Answer 5 · Mon Nov 15 2021 16:14:17 GMT+0800 (China Standard Time)

@stevecroft I certainly agree with that expected behaviour.

@texadactyl's fix seems reasonable to me -- just confirming this is a design flaw and not a newly-introduced bug?

Richard Elkins · Answer 6 · Mon Nov 15 2021 21:05:38 GMT+0800 (China Standard Time)

@telegraphic The extra events & plots have been there since the first time I saw them in 2020 sometime. It is true that I have fixed bugs in that code and I implemented our .dat file format change for #231

So, when the original code was introduced that I just recently changed for the PR? I have no idea without doing an archaeological sift through all of the git changes.

Richard Elkins · Answer 7 · Mon Nov 15 2021 21:09:18 GMT+0800 (China Standard Time)

In hindsight, a more formal QA process could have caught many of the find_even.py bugs and pipeline source code bugs much sooner than they have been noticed. Water under the bridge. My 2 cents.