job stuck in an infinite loop
mheily opened this issue · comments
sysadm goes to 100% CPU when launched by launchd. At first I thought it was signal related, but now I suspect a stray file descriptor. According to truss(1) it's in an infinite loop polling a set of file descriptors. This line is repeated:
poll({ 7/POLLIN 11/POLLIN },2,0) = 0 (0x0)
Here are the files it has open:
% sudo procstat -f 4839
PID COMM FD T V FLAGS REF OFFSET PRO NAME
4839 sysadm-binary text v r r------- - - - /usr/local/bin/sysadm-binary
4839 sysadm-binary cwd v d r------- - - - /
4839 sysadm-binary root v d r------- - - - /
4839 sysadm-binary 0 v c r------- 1 0 - /dev/null
4839 sysadm-binary 1 v c rw------ 4 0 - /dev/null
4839 sysadm-binary 2 v c rw------ 4 0 - /dev/null
4839 sysadm-binary 3 p - rw---n-- 1 0 - -
4839 sysadm-binary 4 p - rw---n-- 1 0 - -
4839 sysadm-binary 5 v r -wa----- 1 498659 - -
4839 sysadm-binary 6 s - rw---n-- 1 0 TCP ::.12150 ::.0
4839 sysadm-binary 7 p - rw---n-- 1 0 - -
4839 sysadm-binary 8 p - rw---n-- 1 0 - -
4839 sysadm-binary 9 p - rw---n-- 1 0 - -
4839 sysadm-binary 10 p - rw---n-- 1 0 - -
4839 sysadm-binary 11 k - rw------ 1 0 - -
4839 sysadm-binary 12 v r r------- 2 0 - /var/log/lpreserver/lpreserver.log
4839 sysadm-binary 13 v r r------- 2 0 - /var/log/lpreserver/lastrep-send-log
This is possibly related to bug #54
The job launches sysadm-server, which then launches a child named sysadm-binary. It is the child process (sysadm-binary) that exhibits the bad behavior.
It appears that sysadm-server dies, and this causes sysadm-binary to spin trying to talk to it.
Here is how sysadm-server was being spawned by rc(8):
sudo -i daemon -r -P /var/run/sysadm-daemon.pid -p /var/run/sysadm.pid /usr/local/bin/sysadm-server
When spawned like this, it opens the following files:
PID COMM FD T V FLAGS REF OFFSET PRO NAME
16719 daemon text v r r------- - - - /usr/sbin/daemon
16719 daemon cwd v d r------- - - - /root
16719 daemon root v d r------- - - - /
16719 daemon 0 v c rw------ 10 19604 - /dev/pts/0
16719 daemon 1 v c rw------ 10 19604 - /dev/pts/0
16719 daemon 2 v c rw------ 10 19604 - /dev/pts/0
16719 daemon 3 v r -w---n-l 1 0 - /var/run/sysadm.pid
16719 daemon 4 v r -w---n-l 1 0 - /var/run/sysadm-daemon.pid
I suspect that it is unhappy that launchd opens stdin/out/err to /dev/null, rather than /dev/pts/0.
Here is the procstat output for volmand(8) that shows it redirects stdio descriptors to /dev/null
PID COMM FD T V FLAGS REF OFFSET PRO NAME
949 daemon text v r r------- - - - /usr/sbin/daemon
949 daemon cwd v d r------- - - - /
949 daemon root v d r------- - - - /
949 daemon 0 v c rw------ 8 0 - /dev/null
949 daemon 1 v c rw------ 8 0 - /dev/null
949 daemon 2 v c rw------ 8 0 - /dev/null
949 daemon 3 v r -w---n-l 1 0 - /var/run/volmand.pid
There is a "-f" flag to daemon(8) that will do the standard I/O redirection, so I think it is normal and proper for daemons to work this way.
Turns out /dev/null isn't the issue. I was able to run sysadm-server with stdio redirected to null via this command:
sudo -i daemon -f -r -P /var/run/sysadm-daemon.pid -p /var/run/sysadm.pid /usr/local/bin/sysadm-server
note the "-f" was added to the original command.
Maybe it's environment variables?
Here's the environment variables when started under daemon(8):
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin PWD=/ HOME=/
If I modify the job manifest to set these variables, the job works fine.
{
"Label": "org.pcbsd.sysadm-rest",
"EnvironmentVariables": {
"PATH": "/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin",
"PWD": "/",
"HOME": "/",
},
"ProgramArguments": ["/usr/local/bin/sysadm-server"],
"RunAtLoad": true
}
Here are the default variables currently set by launchd(8) for a different job:
2981 syscache-daemon LOGNAME=root USER=root HOME=/root PATH=/usr/bin:/bin:/usr/local/bin SHELL=/bin/csh TMPDIR=/tmp
I think for daemons it would be wise to make launchd emit the same variables as daemon(8).