mheily / jobd

A job management framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

job stuck in an infinite loop

mheily opened this issue · comments

sysadm goes to 100% CPU when launched by launchd. At first I thought it was signal related, but now I suspect a stray file descriptor. According to truss(1) it's in an infinite loop polling a set of file descriptors. This line is repeated:

poll({ 7/POLLIN 11/POLLIN },2,0)                 = 0 (0x0)

Here are the files it has open:

% sudo procstat -f 4839
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME        
 4839 sysadm-binary     text v r r-------   -       - -   /usr/local/bin/sysadm-binary
 4839 sysadm-binary      cwd v d r-------   -       - -   /                 
 4839 sysadm-binary     root v d r-------   -       - -   /                 
 4839 sysadm-binary        0 v c r-------   1       0 -   /dev/null         
 4839 sysadm-binary        1 v c rw------   4       0 -   /dev/null         
 4839 sysadm-binary        2 v c rw------   4       0 -   /dev/null         
 4839 sysadm-binary        3 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        4 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        5 v r -wa-----   1  498659 -   -                 
 4839 sysadm-binary        6 s - rw---n--   1       0 TCP ::.12150 ::.0
 4839 sysadm-binary        7 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        8 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary        9 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary       10 p - rw---n--   1       0 -   -                 
 4839 sysadm-binary       11 k - rw------   1       0 -   -                 
 4839 sysadm-binary       12 v r r-------   2       0 -   /var/log/lpreserver/lpreserver.log
 4839 sysadm-binary       13 v r r-------   2       0 -   /var/log/lpreserver/lastrep-send-log

This is possibly related to bug #54

The job launches sysadm-server, which then launches a child named sysadm-binary. It is the child process (sysadm-binary) that exhibits the bad behavior.

It appears that sysadm-server dies, and this causes sysadm-binary to spin trying to talk to it.

Here is how sysadm-server was being spawned by rc(8):

sudo -i daemon -r -P /var/run/sysadm-daemon.pid -p /var/run/sysadm.pid /usr/local/bin/sysadm-server

When spawned like this, it opens the following files:

  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME        
16719 daemon            text v r r-------   -       - -   /usr/sbin/daemon  
16719 daemon             cwd v d r-------   -       - -   /root             
16719 daemon            root v d r-------   -       - -   /                 
16719 daemon               0 v c rw------  10   19604 -   /dev/pts/0        
16719 daemon               1 v c rw------  10   19604 -   /dev/pts/0        
16719 daemon               2 v c rw------  10   19604 -   /dev/pts/0        
16719 daemon               3 v r -w---n-l   1       0 -   /var/run/sysadm.pid
16719 daemon               4 v r -w---n-l   1       0 -   /var/run/sysadm-daemon.pid

I suspect that it is unhappy that launchd opens stdin/out/err to /dev/null, rather than /dev/pts/0.

Here is the procstat output for volmand(8) that shows it redirects stdio descriptors to /dev/null

  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME        
  949 daemon            text v r r-------   -       - -   /usr/sbin/daemon  
  949 daemon             cwd v d r-------   -       - -   /                 
  949 daemon            root v d r-------   -       - -   /                 
  949 daemon               0 v c rw------   8       0 -   /dev/null         
  949 daemon               1 v c rw------   8       0 -   /dev/null         
  949 daemon               2 v c rw------   8       0 -   /dev/null         
  949 daemon               3 v r -w---n-l   1       0 -   /var/run/volmand.pid

There is a "-f" flag to daemon(8) that will do the standard I/O redirection, so I think it is normal and proper for daemons to work this way.

Turns out /dev/null isn't the issue. I was able to run sysadm-server with stdio redirected to null via this command:

sudo -i daemon -f -r -P /var/run/sysadm-daemon.pid -p /var/run/sysadm.pid /usr/local/bin/sysadm-server

note the "-f" was added to the original command.

Maybe it's environment variables?

Here's the environment variables when started under daemon(8):

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin PWD=/ HOME=/

If I modify the job manifest to set these variables, the job works fine.

{
        "Label": "org.pcbsd.sysadm-rest",
        "EnvironmentVariables": {
                "PATH": "/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin",
                "PWD": "/",
                "HOME": "/",
        },
        "ProgramArguments": ["/usr/local/bin/sysadm-server"],
        "RunAtLoad": true
}

Here are the default variables currently set by launchd(8) for a different job:

2981 syscache-daemon  LOGNAME=root USER=root HOME=/root PATH=/usr/bin:/bin:/usr/local/bin SHELL=/bin/csh TMPDIR=/tmp

I think for daemons it would be wise to make launchd emit the same variables as daemon(8).