magao-x / MagAOX

The MagAO-X Software System

Home Page:https://magao-x.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

permanently null pointer

drbitboy opened this issue · comments

static shmimMonitor * m_selfMonitor; ///< Static pointer to this (set in constructor). Used for getting out of the static SIGSEGV handler.

shmimMonitor<derivedT, specificT> * shmimMonitor<derivedT, specificT>::m_selfMonitor = nullptr;

m_selfMonitor->handlerSigSegv(signum, siginf, ucont);

Hi Jared et al.,

First of all, any of these issues are primarily notes for myself and secondarily heads up for you, so feel free to close them as the whim strikes you.

The m_selfMonitor member (line 174, above) of the template<class derivedT, class specificT=shmimT> class shmimMonitor is never assigned anything than the null pointer (line 251, above), and that null pointer is only ever de-reference in the SIGSEGV handler (line 432, above).

There are no other occurrences of that variable name in the repo e.g. find . | grep -v /\\.git/ | xargs grep m_selfMonitor -d skip

this is actually really f-ing nasty. shmimMonitor is designed to be used multiple times by the same class, as a CRTP parent. The point to m_selfMonitor is to do the referenced signal handling. But only one thread can handle a signal, so this is fundamentally broken even if we assign m_selfMonitor somewhere.

The only way I can see that we can handle this is to do signal handling way up in the main MagAOApp class, and have a registration mechanism where each derived class that cares can register to have its signal handler called. Then we have to mask signals in all other threads when created. This may explain several different crashes that occur as a result of changing shmim sizes (which hopefully are going away anyway with ISIO updates).

I was reading this today, and after about five minutes realized that I was the one who opened the issue in the first place. LOL.

I also thought of a possible solution: make m_restart static; then we don't need the static handler _handleSigSegv to have the also-static m_selfMonitor pointer to the actual lone instance in order to call the instance handler handleSigSegv to assign true to the instance variable m_restart.

m_selfMonitor is already static, so there can only be but one functional instance for derived type (specificT) of this class anyway; that being the case, there is no functional difference between static and per-instance members of this class. So we could

  • eliminate the static wrapper function static void shmimMonitor<derivedT, specificT>::_handlerSigSegv(...)
  • eliminate the static pointer m_selfMonitor
  • declare m_restart as a static member of the class
  • declare void shmimMonitor<derivedT, specificT>::handlerSigSegv as a static member of the class, which makes it then able to assign true to the static member m_restart.
  • install void shmimMonitor<derivedT, specificT>::handlerSigSegv as the SIGSEGV handler in int shmimMonitor<derivedT, specificT>::setSigSegvHandler instead of the now eliminated <...>::_handlerSigSegv
  • with no other code changes, the instance of this class tests the now-static member m_restart in <...>::smThreadExec to know when to restart.

Still haven't done anything about this. Hopefully the new ISIO will make this less urgent!

However, the problem with any shmimMonitor specific solution is just that -- it's specific. I want to set us up to be smart about signal handling in all apps, regardless of if they are shmimMonitors or not. So I'm still leaning towards pushing all signal handling to the top level MagAOXApp, incl. blocking signals in all threads, and having the base classes register their own signal handlers there.