Add new optional step to spec2 for self-calibrated bad pixel mask

Question

Add new optional step to spec2 for self-calibrated bad pixel mask

stscijgbot-jp opened this issue 3 months ago · comments

Issue JP-3581 was created on JIRA by David Law:

As part of the Build 11 focus on improving treatment of outliers and bad pixels, this ticket is to implement a new optional step in calwebb_spec2 to derive an updated bad pixel mask via self-calibration from a given program.

This essentially follows in the footsteps of D2P workaround notebook https://github.com/STScI-MIRI/MRS-ExampleNB/blob/main/D2P_Notebooks/MRS_Flag_Badpix.ipynb which has been used for MIRI MRS observations in one form or another for a couple of years.

In brief, this works on the philosophy that the majority of outliers seen in MIRI MRS data are from a gradually growing number of bad pixels. Even regular bad pixel mask updates can never catch all of these, but by using dedicated background observations it's possible to create custom bad pixel masks on the fly via self calibration from a given program.

It turns out that this can be generalized even further, and can often be successfully applied to dithered observations of point sources as well since the IFU field of view is largely empty in such cases.

This step would be called 'Bad Pixel Self Calibration' (badpix_selfcal), and located immediately after the assign_wcs step. It would be OFF by default, but could be enabled by end users if they thought it could help their science observations.

Algorithm works by reading in dithered frames, taking the MIN combination of frames, subtracting a median-filtered copy of that image (kernel smoothing in the spectral dispersion direction), and flagging the top and bottom N% of pixels. These pixels will be set to DQ=DO_NOT_USE and SCI=NaN in each of the frames. This computation should be done on a per-detector and per-filter/band basis. The routine would need to be passed a spec2 association file. If the association file contains inputs identified as backgrounds, it should use only these backgrounds to define the mask, but apply the mask to all inputs. If the association file contains only science images, it should use the science images to do the computation.

Example notebooks attached implementing the core of this for MIRI MRS dedicated backgrounds, MIRI MRS point sources, and NIRSpec IFU point sources. Generally performance looks reasonable for cases that I've tested of MIRI MRS (dedicated background or point source observations), NIRSpec IFU point sources. Initial tests on some NIRISS WFSS and NIRSpec MOS data look potentially interesting.

Filing this now for reference, but to be discussed at a broader meeting.

Jane Morrison · Answer 1 · Sat Mar 16 2024 05:05:36 GMT+0800 (China Standard Time)

@drlaw1558 I have a version of this I adapted for NIRSpec. It works for the program I a running it with but the NIRSpec team needs to look it over and comment. I will post it on this ticket

stscijgbot-jp · Answer 2 · Tue Apr 16 2024 05:13:07 GMT+0800 (China Standard Time)

Comment by David Law on JIRA:

Adding presentation (Custom_badpix.pdf) with examples for MIRI MRS dedicated background data, NIRSpec IFU leakcal data, NIRSpec IFU dithered point source data, NIRSpec MOS data, and NIRISS WFSS data.

Discussed today with Boris Trahin Bryan Hilbert Howard Bushouse James Muzerolle Jo Taylor Kevin Volk Melanie Clarke Rachel Plesha Russell Ryan

Conclusion was to explore implementing this by adding a new 'exptype'='selfcal' (or some other phrase) option to Level2 association files. This could include all associated science and background/leakcal files by default, and users could customize it to include data from other programs if they wished and such data were available. Up to users whether or not this makes sense to enable for their science case. The new bad pixel self-calibration step (needs a snappier name: update_badpix ?) would then be located after assign_wcs, use as input everything with exptype=selfcal, and apply the resulting mask to the science data array.

Other notes:

The algorithm above is simple and generic, but does have some tunable parameters. It needs to know the spectral dispersion axis for the data, but this information is already a keyword in the headers. The best smoothing length to get rid of any continuum remaining in the data may be different for different instruments. In my examples it looks like I used 7 pixels (spectrally) for MIRI, and 15 for the NIR detectors. This could have code defaults for this, and be changeable either by the user or via a parameter reference file.
Saturated data could be problematic, and would be worth testing, as would large sources with bright emission lines.
It would be worth implementing this generically for all spectroscopic modes and imaging modes as well (with a 2d smoothing box). It must be OFF by default, unless specifically enabled via parameter reference file or the user. It is most immediately applicable to MIRI/NIRSpec IFU data, but this would allow other instruments/modes to test it out as well. Algorithm is similar to an approach that Bryan Hilbert has been using for NIRCam imaging data (see also James Davies' snowblind package).
Discussion on whether it would be worth setting an information bit in addition to just DO_NOT_USE. This would require repurposing bits though, and would be simpler to set just DO_NOT_USE.

stscijgbot-jp · Answer 3 · Mon May 20 2024 21:53:08 GMT+0800 (China Standard Time)

Comment by Ned Molter on JIRA:

David Law Jane Morrison would you please point me to some test data, i.e., the sci and bkg .rate files that are sitting in the data/ folders to which all of these notebooks point? I'd like to run those notebooks myself to better understand what is needed for this step.

stscijgbot-jp · Answer 4 · Mon May 20 2024 22:28:08 GMT+0800 (China Standard Time)

Comment by David Law on JIRA:

Ned Molter Certainly, see https://stsci.box.com/s/96xg2oxl93suazc2ev202mftm96dh4db

stscijgbot-jp · Answer 5 · Mon May 20 2024 22:33:08 GMT+0800 (China Standard Time)

Comment by Ned Molter on JIRA:

got it, thank you!

stscijgbot-jp · Answer 6 · Tue May 21 2024 02:53:09 GMT+0800 (China Standard Time)

Comment by Ned Molter on JIRA:

I started implementing this, and I found a few small points of confusion on what implementation is desired due to differences between MRS_Flag_Badpix.ipynb and mrsifu_bg.ipynb:

The description in this ticket specifies combining two or more background dithers by taking the MIN pixel by pixel, but in the notebook MRS_Flag_Badpix.ipynb, the line that combines dithers appears to take the nanmedian, and the comment says "Collapse to create mean". I don't think it will matter too much in practice, but which of these is actually desired?

The way that MRS_Flag_Badpix.ipynb and mrsifu_bg.ipynb apply smoothing in the spectral direction appears to differ. In MRS_Flag_Badpix.ipynb, the wavelengths at each pixel (from assign_wcs) and the fluxes are fit to a low-order polynomial, and that polynomial is subtracted from the data to find outliers. In mrsifu_bg.ipynb, a median filter is applied in the spectral (Y-) direction, and the filtered data is subtracted from the data to find outliers. Which is desired? (happy to implement both if needed and see what differences there are / which runs faster)

MRS_Flag_Badpix.ipynb only flags bright outliers, while mrsifu_bg.ipynb flags both bright and faint outliers. Which is desired?

stscijgbot-jp · Answer 7 · Tue May 21 2024 03:43:06 GMT+0800 (China Standard Time)

Comment by David Law on JIRA:

Ned Molter MRS_Flag_Badpix.ipynb is an older but conceptually similar implementation; mrsifu_bg.ipynb is the right notebook to follow. Essentially,

(1) I found that the median doesn't work for many IFU cases where you're dealing with just two exposures, each with a source in them. As such using MIN was more reliable.

(2) The polynomial required some unpleasant juggling of the wavelength solution and slice boundaries, while the median filter in the spectral direction was both much simpler and more robust.

(3) Bad pixels can be bright or faint, so the updated notebook flags both.

stscijgbot-jp · Answer 8 · Tue May 21 2024 03:53:08 GMT+0800 (China Standard Time)

Comment by Ned Molter on JIRA:

Sounds good. PR should be coming this week - I will be sure to add you as one of the reviewers

stscijgbot-jp · Answer 9 · Tue May 21 2024 04:13:09 GMT+0800 (China Standard Time)

Comment by Ned Molter on JIRA:

Just one more thing to confirm: is it true that there is no real difference in the implementation for nirspec, miri, and miri point source?

stscijgbot-jp · Answer 10 · Tue May 21 2024 04:33:08 GMT+0800 (China Standard Time)

Comment by David Law on JIRA:

That's the plan for right now; the MIN combine should generally work in most cases (though it'll always do better if there's a genuine background/leakcal in the list of 'selfcal' exposures). Since it's an optional step that's off by default, the idea was to code it in a way that makes sense for the modes/science examples that I've worked with and allow the other instruments/modes (imaging+spectroscopy) to test it as well. If it works they can consider recommending that users enable it for offline reprocessing in some cases, and if there are some modifications that would make it useful for them we can consider making it more complicated later.

stscijgbot-jp · Answer 11 · Tue May 21 2024 05:23:08 GMT+0800 (China Standard Time)

Comment by Jane Morrison on JIRA:

[^Flag_badpix_v2.ipynb]

stscijgbot-jp · Answer 12 · Tue May 21 2024 05:23:08 GMT+0800 (China Standard Time)

Comment by Jane Morrison on JIRA:

Here is what I tried for some NIRSPec data - it seemed to work for the data I was working with [^Flag_badpix_nrs.py]

stscijgbot-jp · Answer 13 · Tue May 21 2024 05:33:07 GMT+0800 (China Standard Time)

Comment by Jane Morrison on JIRA:

Davids original code only flagged the science data - but I found if you also flagged the background "warm pixels" then when they are used in the pipeline the warm pixels are flagged and not used and it gave me better results.

I attached the modified version of David's notebook that I used.

stscijgbot-jp · Answer 14 · Tue May 21 2024 21:13:07 GMT+0800 (China Standard Time)

Comment by Rachel Plesha on JIRA:

David Law originally this was presented as a possibility for WFSS as well, but I notice that this ticket only lists IFU as an affected mode and instrument. Should those fields be updated to include WFSS (NIRISS & NIRCam) and MOS (NIRSpec), or did you want to keep that implementation and testing in a different ticket somewhere?

stscijgbot-jp · Answer 15 · Tue May 21 2024 22:28:08 GMT+0800 (China Standard Time)

Comment by David Law on JIRA:

Rachel Plesha The mode indicated on the ticket metadata isn't particularly meaningful in this case. It's true that the routine originated from and has been tested most on IFU data, but obviously we're also optimistic that it could (in some science cases) perform well for many of the other modes too.

Rather than checking every mode option I'm inclined to leave it as-is, but certainly this ticket can be used for feedback from any of the modes.

stscijgbot-jp · Answer 16 · Tue May 21 2024 23:23:08 GMT+0800 (China Standard Time)

Comment by Rachel Plesha on JIRA:

David Law we depend heavily on the metadata to determine the work needed to be done, prioritized, and discussed. If our instrument isn't listed, then we will likely miss the ticket unless it it brought to our attention separately. It is much more useful to have the labels correct and then later decide that we do not need to test for different modes than to miss the ticket because it did not have the label and realize we needed to test it.

So perhaps phrasing this in a different way, do you want this to be looked into for WFSS further based on our initial meeting? If so, we need to add labels for our workflow.

stscijgbot-jp · Answer 17 · Tue May 21 2024 23:48:08 GMT+0800 (China Standard Time)

Comment by David Law on JIRA:

Rachel Plesha If it's helpful for your workflow we can certainly add WFSS.