Supervisor / supervisor

Supervisor process control system for Unix (supervisord)

Home Page:http://supervisord.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

supervisord Starts All Processes at the Same Time

fredpalmer opened this issue · comments

My config:

[program:redis-testapp]
command=/opt/bcs/bin/redis-server /apps/testapp/releases/current/environments/all/redis.conf
stdout_logfile=/var/log/redis_testapp_log
stderr_logfile=/var/log/redis_testapp_log
startsecs=30
priority=1
autostart=true
autorestart=true

[program:celerybeat-testapp]
command=python -O manage.py celerybeat --loglevel=INFO --schedule=/apps/testapp/db/celerybeat_schedule_db
stdout_logfile=/var/log/celerybeat_testapp_log
stderr_logfile=/var/log/celerybeat_testapp_log
priority=999
startsecs=5
autostart=true

[program:celery-testapp]
command=python -O manage.py celeryd --loglevel=INFO --events
stdout_logfile=/var/log/celeryd_testapp_log
stderr_logfile=/var/log/celeryd_testapp_log
priority=100
startsecs=10
autostart=true

[program:gunicorn-testapp]
command=gunicorn_django --workers=10 --log-level info --timeout 500 --bind=127.0.0.1:8004
stdout_logfile=/var/log/gunicorn_testapp_log
stderr_logfile=/var/log/gunicorn_testapp_log
priority=999
startsecs=10
autostart=true

[program:memcached-testapp]
command=/opt/bcs/bin/memcached -m 128 -l 127.0.0.1 -p 11212 -u nobody -P /apps/testapp/run/memcached.pid
stdout_logfile=/var/log/memcached_testapp_log
stderr_logfile=/var/log/memcached_testapp_log
priority=11
autostart=true
autorestart=true


My output =>

2012-05-30 22:37:33,181 INFO daemonizing the supervisord process
2012-05-30 22:37:33,182 INFO supervisord started with pid 16230
2012-05-30 22:37:34,195 INFO spawned: 'redis-testapp' with pid 16232
2012-05-30 22:37:34,206 INFO spawned: 'memcached-testapp' with pid 16233
2012-05-30 22:37:34,214 INFO spawned: 'celery-testapp' with pid 16234
2012-05-30 22:37:34,238 INFO spawned: 'celerybeat-testapp' with pid 16235
2012-05-30 22:37:34,477 INFO spawned: 'gunicorn-testapp' with pid 16241
2012-05-30 22:37:35,240 INFO success: memcached-testapp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2012-05-30 22:37:39,434 INFO success: celerybeat-testapp entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2012-05-30 22:37:44,434 INFO success: celery-testapp entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2012-05-30 22:37:44,435 INFO success: gunicorn-testapp entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2012-05-30 22:38:04,197 INFO success: redis-testapp entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)

What I expected to happen =>

redis would start and supervisord would wait 30 seconds before starting any lower priority processes.

I don't see what's the point of using supervisor if it does not get this right.

@mnaberez If there is an open issue for this, you should post the link so we can find it.

I also need the ability to start processes in a particular order.

An event-based approach could work. For example, if I have a [program:agent] and a [program:client] I could subscribe the client to start up when the agent emits a started event. By default, if a program does not subscribe to any event, it will be started when supervisord starts.

+1, this major issue.

+1

+1

must have

is there a dependency in supervisor? if not, why not?

Not wanting to add to any perceived pressure, but I'd be +1 on the very same feature (process dependencies). :-)

this would be very useful and mean less ugly hacks like sleep

+1. Maybe an option to turn the process loading synchronous.

Anyone got an other implementation thoughts on this? I'm considering making a run at it since no one else seems to care.

@tomislacker well, IMHO the best solution is to make dependency-solver à la Puppet. If I can see well, then simplest topological sort with DFS should be more than enough. Then add the "requiresstarted" field (or "waitforstarted", or sth similar). This issue would be fixed by the way.

(edit) Well, not sure what would happen if while having A -> B -> C (A depends on B, etc), someone would try to stop B. Should A be stopped as well? Well… /depends/…

👍

Since everyone seems to need/want to do this a slightly different way -- how about a slightly different tack? Instead of attempting to express dependencies directly in a configuration directive, how about instead adding a directive for a 'helper' script, and a couple example helpers that suit 2-3 general use cases with differing requirements as @kgadek pointed out?

Here's a good example to demonstrate what I mean:
http://www.serfdom.io/docs/recipes/event-handler-router.html

There's some stuff out there for doing this sort of thing easily in bash,
https://github.com/progrium/pluginhook

And you certainly can't discount the nicety of "I felt like writing it in python" (Or any other language of admin's choice)
https://github.com/garethr/serf-master

It seems to me like this would be a reasonable win for everyone without consuming a lot of developer time on supervisor itself.

Hey,

The serf model looks very promising. Having a collection of scripts instead of just one handler seems appropriate (e.g. checking for 2 deps before starting)

So, you would define a handler to see if process/group:x "can_be_started", and if all scripts return 0, then it's all good. Sounds right?

Imho this makes "priority" obsolete, am I wrong?

For my needs, simply having a per-process delay from the time the supervisor daemon starts would be enough.

yet another +1 for this

commented

+1. Per process delay would be enough tho.

I've been able to work around this a bit. Quick and dirty.

# Deal with updating our repositories.
supervisorctl start source-code-deploy

# Check for the oneshot process to complete.
while ! supervisorctl status source-code-deploy | grep -q 'EXITED'; do sleep 1; done
# Wait for the while loop to break out signalling success.

# Start the late boot process, now that the deployment is complete.
supervisorctl start system-boot-late

# Check for the oneshot process to complete.
while ! supervisorctl status system-boot-late | grep -q 'EXITED'; do sleep 1; done
# And now we should become EXITED to supervisord and any other tasks relying on the above.

Hello,
I'd like to have a go at this in the coming weeks, let me know if anyone else is working on it.

Attempting to start many processes at exactly the same time tends to push the machines to the limit and sometimes even beyond.

Having a parameter like "startdelay" would be very useful for me especially since some processes tend to use a lot of resources when first starting, resources which are released after a short while.

Unfortunately there are quite a few ways to implement such a feature and finding the best one could take some time.

@vladfr Perhaps we could collaborate on this.

My first "dirty" crack at it:
https://github.com/liutec/supervisor/commit/eab7cc1e04ad49768593183e8134298604459827
(I really don't like having to use sleep this way.)

Hi @liutec, look above at what @kamilion is saying about a hook model - I think this is useful to implement custom checks, and could be used to just sleep for a simple case.
You could run the hook scripts in subprocesses and they can sleep all they want.

In my situation, the goal is to be able to have supervisor manage around 240 distinct programs some of which may require more than one instance.

The start delay is only useful for the first time a program is started, simply to avoid the otherwise unlikely situation when all programs start at the exact same time.

Most of the programs are consumers and will only run for a short amount of time before they shutdown and are restarted by the supervisor -- in this case having a start delay will not do any good.

I've considered both @kamilion 's solution as well as the solution given by @mnaberez (autostart autorestart) but unfortunately none actually produce an easily configurable "startdelay".

+1 if supervisord had "sequence control" that would be great, now Kafka depends on Zookeeper etc. and supervisor cannot used like we would like to.

I am using supervisord to start a bunch of exotic qemu instances (for a compilers course) and some of these boot up quickly by themselves but take quite a long time if started all at once. A simple "startdelay" thing would work just fine: Sleep for the given number of seconds after starting this process before moving on to the next one in the list. I don't know exactly how you decide to order these (the priority keyword doesn't seem to order strictly as far as I can tell from the created pids) but whatever it is, being able to delay for a few seconds after each process would help a whole bunch in my case. So 👍 for sure. Twice, actually. :-)

I think a startdelay option would be very usefull. I have a supervisor to manage an AMQP server and some consumers. Some consumers implement a sleep before starting, some loop over a try to connect / error / sleep sequence (different implementations). I know the maximum time my AMQP server needs to start, so adding a startdelay option to the consumers would avoid those ugly hacks.

I think this may be very helpful in many situations.

After checking the code, I don't see any BC issues, also the supeervisor behaviour is the same when using the default value.
I can't wait for this to be merged.

+1 for dependency support

+1

commented

+1

This issue needs to be renamed, because I don't think that startsecs is relevant here. I feel like the OP might've misinterpreted what startsecs is, thinking that it's a delay to wait before starting the process.

startsecs

The total number of seconds which the program needs to stay running after a startup to consider
the start successful. If the program does not stay up for this many seconds after it has
started, even if it exits with an “expected” exit code (see exitcodes), the startup will be >
considered a failure. Set to 0 to indicate that the program needn’t stay running for any
particular amount of time.

commented

+1

important feature
+1 as well

+1

Simple workaround I use:

[program:uwsgi]
command=bash -c 'sleep 5 && uwsgi /etc/uwsgi.ini'

Anyone have thoughts on how we may be able to solve this issue in a more responsible fashion? What if there were to be a command that could be invoked to validate the "online" state of a process? Such as:

[program:myapp]
command=/usr/local/bin/myapp-microservice
checkcommand=curl -s http://localhost:54321/v1/_ping
checkfreq=1
checktimeout=3
startsecs=5

The above example would imply:

  1. The command would be run as normal

  2. The checkcommand would be executed every checkfreq seconds after the above invocation until startsecs, possibly in conjunction with checktimeout, causes an abort or the command returns 0.

    Example: If each call to checkcommand takes <=3 seconds, checkfreq=1, checktimeout=3, and startsecs=5; checkcommand gets run +1.0s, and if it failed then, +5.0s, and discontinues marks the serviced failed on the conclusion of the second checkcommand invocation.

  3. If any invocation of checkcommand returns an exit status of 0, then service is considered online.

  4. If checkcommand does not return an exit status of 0 after startsecs, or the last possible invocation of checkcommand hangs for >=checktimeout seconds, the service start is considered as a failure & "normal" logic is assumed that is already in placed. (Mark the service as a failure, output the same log content, etc...)

It just seems to me that we want the problem solved but we haven't described it well enough in this issue. The low-hanging-fruit answer is certainly make sure that services with different priorities are not invoked until the max(startsecs) of all higher priority services. But aren't we only asking that in hopes of being able to loosely choreograph our intended result instead of being able to validate it on the way there?

@tomislacker I think for many uses loose choreography is an acceptable 90% case even if full validation is the 100% case.

Once you get into true dependency management you start looking a lot more like a real init and it starts becoming a much more complicated problem. But there are definitely low-hanging-fruit scenarios.

+1 I need this

@tomislacker this is a good idea, and easy to use because you just need the current config file.
But how do you handle an actual dependency? If my_program depends on Postgres, I check for it in checkcommand for a couple of times and hope it will start?

To express a dependency, I think you want to run checkcommand before the actual command.

@tomislacker I would prefer there be simple dependencies between the programs

[program:A]
command=/usr/local/bin/A

[program:C]
command=/usr/local/bin/C
dependson=A

and let the user decide if the programs are going to check for more complicated conditions. For example: We can insert [program:B] that will perform a sophisticated check to ensure A is running properly; and fail if not. So, A starts. B depends on A so will only start when Supervisor considers A is RUNNING. When B starts, it will perform its analysis and fail if A is not running properly. C only starts when B ran (or is running successfully).

My suggestion does not cover the extra features you suggest, like number of checks, the timing of those checks, and when to give up; but I also advocate giving Supervisor some Cron-like features so programs like B are easy to define:

[program:A]
command=/usr/local/bin/A

[program:B]
command=curl -s http://localhost:54321/v1/_ping
dependson=A
startretries=3
restartintervalsec=5

[program:C]
command=/usr/local/bin/C
dependson=B

I also advocate giving Supervisor some Cron-like features

Please see response in #635. This has been considered at length by the Supervisor developers in the past and it was decided that cron-like functionality is out of scope for this project. Sorry.

👍 for a way to deliver sequential execution in a feasible fashion.

Using something like startuppriority would be a good robust way to do this IMO.

+1

@klahnakoski in your first example, C depending on A means that supervisor needs to know A is ready. Right now, it only knows it's started.

@vladfr yes, your are correct. That is why I also suggested B, which is a more sophisticated check to determine if A is ready. Ideally B is triggered periodically, and will fail if A is not ready.

+1

+1 ?? This is from May, 2012, I think that supervisor will never have dependencies.

Someone could suggest alternatives with dependencies management?

+1, this would make things much more elegant

+1

+1, it's sad that this is not doable after 3 years, using systemd units for now

 .----------------.  .----------------. 
| .--------------. || .--------------. |
| |      _       | || |     __       | |
| |     | |      | || |    /  |      | |
| |  ___| |___   | || |    `| |      | |
| | |___   ___|  | || |     | |      | |
| |     | |      | || |    _| |_     | |
| |     |_|      | || |   |_____|    | |
| |              | || |              | |
| '--------------' || '--------------' |
 '----------------'  '----------------' 

Or please, add it.

commented

+1

+1

commented

+10086