supervisord Starts All Processes at the Same Time

Question

supervisord Starts All Processes at the Same Time

fredpalmer opened this issue 12 years ago · comments

My config:

[program:redis-testapp]
command=/opt/bcs/bin/redis-server /apps/testapp/releases/current/environments/all/redis.conf
stdout_logfile=/var/log/redis_testapp_log
stderr_logfile=/var/log/redis_testapp_log
startsecs=30
priority=1
autostart=true
autorestart=true

[program:celerybeat-testapp]
command=python -O manage.py celerybeat --loglevel=INFO --schedule=/apps/testapp/db/celerybeat_schedule_db
stdout_logfile=/var/log/celerybeat_testapp_log
stderr_logfile=/var/log/celerybeat_testapp_log
priority=999
startsecs=5
autostart=true

[program:celery-testapp]
command=python -O manage.py celeryd --loglevel=INFO --events
stdout_logfile=/var/log/celeryd_testapp_log
stderr_logfile=/var/log/celeryd_testapp_log
priority=100
startsecs=10
autostart=true

[program:gunicorn-testapp]
command=gunicorn_django --workers=10 --log-level info --timeout 500 --bind=127.0.0.1:8004
stdout_logfile=/var/log/gunicorn_testapp_log
stderr_logfile=/var/log/gunicorn_testapp_log
priority=999
startsecs=10
autostart=true

[program:memcached-testapp]
command=/opt/bcs/bin/memcached -m 128 -l 127.0.0.1 -p 11212 -u nobody -P /apps/testapp/run/memcached.pid
stdout_logfile=/var/log/memcached_testapp_log
stderr_logfile=/var/log/memcached_testapp_log
priority=11
autostart=true
autorestart=true

My output =>

2012-05-30 22:37:33,181 INFO daemonizing the supervisord process
2012-05-30 22:37:33,182 INFO supervisord started with pid 16230
2012-05-30 22:37:34,195 INFO spawned: 'redis-testapp' with pid 16232
2012-05-30 22:37:34,206 INFO spawned: 'memcached-testapp' with pid 16233
2012-05-30 22:37:34,214 INFO spawned: 'celery-testapp' with pid 16234
2012-05-30 22:37:34,238 INFO spawned: 'celerybeat-testapp' with pid 16235
2012-05-30 22:37:34,477 INFO spawned: 'gunicorn-testapp' with pid 16241
2012-05-30 22:37:35,240 INFO success: memcached-testapp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2012-05-30 22:37:39,434 INFO success: celerybeat-testapp entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2012-05-30 22:37:44,434 INFO success: celery-testapp entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2012-05-30 22:37:44,435 INFO success: gunicorn-testapp entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
2012-05-30 22:38:04,197 INFO success: redis-testapp entered RUNNING state, process has stayed up for > than 30 seconds (startsecs)

What I expected to happen =>

redis would start and supervisord would wait 30 seconds before starting any lower priority processes.

sevastos commented 10 years ago

+1

maximilize commented 10 years ago

+1

Ben Tomasik commented 10 years ago

+1

Jeff Mitchell commented 10 years ago

+1

Hunter Loftis commented 10 years ago

+1

Or Arbel commented 10 years ago

+1

Allen commented 10 years ago

+1

Roberto Bampi commented 10 years ago

+1

Karol Woźniak commented 10 years ago

👍

Karan Lyons commented 10 years ago

+1

Helje commented 10 years ago

+1

Hannes Tiede commented 10 years ago

+1

harmy commented 10 years ago

must have

Mathias Bogaert commented 10 years ago

+1

Devrim Yasar commented 10 years ago

+1

Alper Ortac commented 10 years ago

+1

Mike Trienis commented 10 years ago

+1

Simon Dittlmann commented 10 years ago

+1

Joseph Wenninger commented 10 years ago

+1

Felicioli Claudio commented 10 years ago

+1

Brian Holcomb commented 10 years ago

👍

Clint Ecker commented 10 years ago

+1

Konrad Gądek commented 10 years ago

👍

Vlad Fratila commented 10 years ago

+1

Bill Ataras commented 10 years ago

+1

lostsnow commented 10 years ago

+1

sinchb commented 10 years ago

👍

Nicholas Turner commented 9 years ago

+1

Michael Sierks commented 9 years ago

+1

Jack Wilsdon commented 9 years ago

+1

Sukrit Khera commented 9 years ago

+1

Dan Ros commented 9 years ago

+1

Marc-Antoine Parent commented 9 years ago

+1

Di Wu commented 9 years ago

+1

karl-forner-quartz-bio commented 9 years ago

+10 ;)

Anton Bessonov commented 9 years ago

+1.

Arinto Murdopo commented 9 years ago

+1

J commented 9 years ago

+1

Sean McLaughlin commented 9 years ago

+1

ahosie commented 9 years ago

+1

Piotr Piatkowski commented 9 years ago

+1

Max Schaefer commented 9 years ago

+1

Artur Rodrigues commented 9 years ago

+1

Masaki Yoshida commented 9 years ago

+1

Morgan Antonsson commented 9 years ago

+1

Vasily Ostanin commented 9 years ago

👍

Vincent van Scherpenseel commented 9 years ago

👍

SimaWB commented 9 years ago

+1

TalonOne commented 9 years ago

+1

zhu jianjiang commented 9 years ago

+1

Rimvydas Naktinis commented 9 years ago

+1

Alex Mootassem commented 9 years ago

+1

cocasema commented 9 years ago

+1

Mike Vella commented 9 years ago

+1

cutewalker commented 9 years ago

+1

wojss commented 9 years ago

+1

Carlos Crespo commented 9 years ago

+1

ddzialak commented 9 years ago

+1

ethan commented 8 years ago

+10086

Ben Jones commented 8 years ago

+1

disposable-ksa98 · Answer 1 · Tue Jul 02 2013 23:08:31 GMT+0800 (China Standard Time)

I don't see what's the point of using supervisor if it does not get this right.

@mnaberez If there is an open issue for this, you should post the link so we can find it.

Roberto Aguilar · Answer 2 · Tue Jul 09 2013 09:19:55 GMT+0800 (China Standard Time)

I also need the ability to start processes in a particular order.

An event-based approach could work. For example, if I have a [program:agent] and a [program:client] I could subscribe the client to start up when the agent emits a started event. By default, if a program does not subscribe to any event, it will be started when supervisord starts.

Dmitriy Kiriyenko · Answer 3 · Mon Apr 21 2014 16:52:58 GMT+0800 (China Standard Time)

+1, this major issue.

Sam Xiao · Answer 4 · Tue Jun 24 2014 03:29:49 GMT+0800 (China Standard Time)

is there a dependency in supervisor? if not, why not?

miguno · Answer 5 · Wed Sep 03 2014 16:21:27 GMT+0800 (China Standard Time)

Not wanting to add to any perceived pressure, but I'd be +1 on the very same feature (process dependencies). :-)

nando · Answer 6 · Thu Sep 04 2014 23:11:09 GMT+0800 (China Standard Time)

this would be very useful and mean less ugly hacks like sleep

Ricardo Almeida Silva · Answer 7 · Mon Sep 15 2014 19:48:16 GMT+0800 (China Standard Time)

+1. Maybe an option to turn the process loading synchronous.

Ben Tomasik · Answer 8 · Fri Sep 26 2014 13:07:15 GMT+0800 (China Standard Time)

Anyone got an other implementation thoughts on this? I'm considering making a run at it since no one else seems to care.

Konrad Gądek · Answer 9 · Mon Sep 29 2014 22:32:47 GMT+0800 (China Standard Time)

@tomislacker well, IMHO the best solution is to make dependency-solver à la Puppet. If I can see well, then simplest topological sort with DFS should be more than enough. Then add the "requiresstarted" field (or "waitforstarted", or sth similar). This issue would be fixed by the way.

(edit) Well, not sure what would happen if while having A -> B -> C (A depends on B, etc), someone would try to stop B. Should A be stopped as well? Well… /depends/…

Sterling Windmill · Answer 10 · Wed Nov 05 2014 21:19:14 GMT+0800 (China Standard Time)

Sterling Windmill commented 10 years ago

Kamilion · Answer 11 · Sun Nov 16 2014 05:01:53 GMT+0800 (China Standard Time)

Since everyone seems to need/want to do this a slightly different way -- how about a slightly different tack? Instead of attempting to express dependencies directly in a configuration directive, how about instead adding a directive for a 'helper' script, and a couple example helpers that suit 2-3 general use cases with differing requirements as @kgadek pointed out?

Here's a good example to demonstrate what I mean:
http://www.serfdom.io/docs/recipes/event-handler-router.html

There's some stuff out there for doing this sort of thing easily in bash,
https://github.com/progrium/pluginhook

And you certainly can't discount the nicety of "I felt like writing it in python" (Or any other language of admin's choice)
https://github.com/garethr/serf-master

It seems to me like this would be a reasonable win for everyone without consuming a lot of developer time on supervisor itself.

Vlad Fratila · Answer 12 · Tue Nov 18 2014 20:23:19 GMT+0800 (China Standard Time)

Hey,

The serf model looks very promising. Having a collection of scripts instead of just one handler seems appropriate (e.g. checking for 2 deps before starting)

So, you would define a handler to see if process/group:x "can_be_started", and if all scripts return 0, then it's all good. Sounds right?

Imho this makes "priority" obsolete, am I wrong?

Jeff Mitchell · Answer 13 · Sat Nov 22 2014 00:47:52 GMT+0800 (China Standard Time)

For my needs, simply having a per-process delay from the time the supervisor daemon starts would be enough.

binhex · Answer 14 · Mon Dec 01 2014 18:15:34 GMT+0800 (China Standard Time)

yet another +1 for this

Samet · Answer 15 · Thu Dec 11 2014 22:55:53 GMT+0800 (China Standard Time)

+1. Per process delay would be enough tho.

Kamilion · Answer 16 · Thu Dec 11 2014 22:59:26 GMT+0800 (China Standard Time)

I've been able to work around this a bit. Quick and dirty.

# Deal with updating our repositories.
supervisorctl start source-code-deploy

# Check for the oneshot process to complete.
while ! supervisorctl status source-code-deploy | grep -q 'EXITED'; do sleep 1; done
# Wait for the while loop to break out signalling success.

# Start the late boot process, now that the deployment is complete.
supervisorctl start system-boot-late

# Check for the oneshot process to complete.
while ! supervisorctl status system-boot-late | grep -q 'EXITED'; do sleep 1; done
# And now we should become EXITED to supervisord and any other tasks relying on the above.

Vlad Fratila · Answer 17 · Mon Jan 05 2015 17:41:11 GMT+0800 (China Standard Time)

Hello,
I'd like to have a go at this in the coming weeks, let me know if anyone else is working on it.

Andrei Liutec · Answer 18 · Tue Jan 06 2015 06:35:14 GMT+0800 (China Standard Time)

Attempting to start many processes at exactly the same time tends to push the machines to the limit and sometimes even beyond.

Having a parameter like "startdelay" would be very useful for me especially since some processes tend to use a lot of resources when first starting, resources which are released after a short while.

Unfortunately there are quite a few ways to implement such a feature and finding the best one could take some time.

@vladfr Perhaps we could collaborate on this.

My first "dirty" crack at it:
https://github.com/liutec/supervisor/commit/eab7cc1e04ad49768593183e8134298604459827
(I really don't like having to use sleep this way.)

Vlad Fratila · Answer 19 · Tue Jan 06 2015 07:49:38 GMT+0800 (China Standard Time)

Hi @liutec, look above at what @kamilion is saying about a hook model - I think this is useful to implement custom checks, and could be used to just sleep for a simple case.
You could run the hook scripts in subprocesses and they can sleep all they want.

Andrei Liutec · Answer 20 · Tue Jan 06 2015 17:11:59 GMT+0800 (China Standard Time)

In my situation, the goal is to be able to have supervisor manage around 240 distinct programs some of which may require more than one instance.

The start delay is only useful for the first time a program is started, simply to avoid the otherwise unlikely situation when all programs start at the exact same time.

Most of the programs are consumers and will only run for a short amount of time before they shutdown and are restarted by the supervisor -- in this case having a start delay will not do any good.

I've considered both @kamilion 's solution as well as the solution given by @mnaberez (autostart autorestart) but unfortunately none actually produce an easily configurable "startdelay".

ripasapa · Answer 21 · Tue Jan 13 2015 04:47:06 GMT+0800 (China Standard Time)

+1 if supervisord had "sequence control" that would be great, now Kafka depends on Zookeeper etc. and supervisor cannot used like we would like to.

Peter H. Fröhlich · Answer 22 · Thu Jan 29 2015 11:29:39 GMT+0800 (China Standard Time)

I am using supervisord to start a bunch of exotic qemu instances (for a compilers course) and some of these boot up quickly by themselves but take quite a long time if started all at once. A simple "startdelay" thing would work just fine: Sleep for the given number of seconds after starting this process before moving on to the next one in the list. I don't know exactly how you decide to order these (the priority keyword doesn't seem to order strictly as far as I can tell from the created pids) but whatever it is, being able to delay for a few seconds after each process would help a whole bunch in my case. So 👍 for sure. Twice, actually. :-)

ilyes kooli · Answer 23 · Sun Feb 01 2015 16:52:27 GMT+0800 (China Standard Time)

I think a startdelay option would be very usefull. I have a supervisor to manage an AMQP server and some consumers. Some consumers implement a sleep before starting, some loop over a try to connect / error / sleep sequence (different implementations). I know the maximum time my AMQP server needs to start, so adding a startdelay option to the consumers would avoid those ugly hacks.

I think this may be very helpful in many situations.

After checking the code, I don't see any BC issues, also the supeervisor behaviour is the same when using the default value.
I can't wait for this to be merged.

Karl Forner · Answer 24 · Mon Feb 09 2015 19:47:39 GMT+0800 (China Standard Time)

+1 for dependency support

Marc Abramowitz · Answer 25 · Tue Mar 10 2015 01:44:07 GMT+0800 (China Standard Time)

This issue needs to be renamed, because I don't think that startsecs is relevant here. I feel like the OP might've misinterpreted what startsecs is, thinking that it's a delay to wait before starting the process.

startsecs

The total number of seconds which the program needs to stay running after a startup to consider
the start successful. If the program does not stay up for this many seconds after it has
started, even if it exits with an “expected” exit code (see exitcodes), the startup will be >
considered a failure. Set to 0 to indicate that the program needn’t stay running for any
particular amount of time.

Andus Lim · Answer 26 · Mon Apr 06 2015 11:10:28 GMT+0800 (China Standard Time)

important feature
+1 as well

dmytro · Answer 27 · Tue May 12 2015 06:25:10 GMT+0800 (China Standard Time)

Simple workaround I use:

[program:uwsgi]
command=bash -c 'sleep 5 && uwsgi /etc/uwsgi.ini'

Ben Tomasik · Answer 28 · Fri Jun 05 2015 12:40:59 GMT+0800 (China Standard Time)

Anyone have thoughts on how we may be able to solve this issue in a more responsible fashion? What if there were to be a command that could be invoked to validate the "online" state of a process? Such as:

[program:myapp]
command=/usr/local/bin/myapp-microservice
checkcommand=curl -s http://localhost:54321/v1/_ping
checkfreq=1
checktimeout=3
startsecs=5

The above example would imply:

The command would be run as normal
The checkcommand would be executed every checkfreq seconds after the above invocation until startsecs, possibly in conjunction with checktimeout, causes an abort or the command returns 0.

Example: If each call to checkcommand takes <=3 seconds, checkfreq=1, checktimeout=3, and startsecs=5; checkcommand gets run +1.0s, and if it failed then, +5.0s, and discontinues marks the serviced failed on the conclusion of the second checkcommand invocation.
If any invocation of checkcommand returns an exit status of 0, then service is considered online.
If checkcommand does not return an exit status of 0 after startsecs, or the last possible invocation of checkcommand hangs for >=checktimeout seconds, the service start is considered as a failure & "normal" logic is assumed that is already in placed. (Mark the service as a failure, output the same log content, etc...)

It just seems to me that we want the problem solved but we haven't described it well enough in this issue. The low-hanging-fruit answer is certainly make sure that services with different priorities are not invoked until the max(startsecs) of all higher priority services. But aren't we only asking that in hopes of being able to loosely choreograph our intended result instead of being able to validate it on the way there?

Jeff Mitchell · Answer 29 · Tue Jun 16 2015 22:08:20 GMT+0800 (China Standard Time)

@tomislacker I think for many uses loose choreography is an acceptable 90% case even if full validation is the 100% case.

Once you get into true dependency management you start looking a lot more like a real init and it starts becoming a much more complicated problem. But there are definitely low-hanging-fruit scenarios.

Deleted user · Answer 30 · Fri Jul 17 2015 21:37:55 GMT+0800 (China Standard Time)

+1 I need this

Vlad Fratila · Answer 31 · Wed Jul 22 2015 01:44:49 GMT+0800 (China Standard Time)

@tomislacker this is a good idea, and easy to use because you just need the current config file.
But how do you handle an actual dependency? If my_program depends on Postgres, I check for it in checkcommand for a couple of times and hope it will start?

To express a dependency, I think you want to run checkcommand before the actual command.

Kyle Lahnakoski · Answer 32 · Wed Jul 22 2015 03:24:45 GMT+0800 (China Standard Time)

@tomislacker I would prefer there be simple dependencies between the programs

[program:A]
command=/usr/local/bin/A

[program:C]
command=/usr/local/bin/C
dependson=A

and let the user decide if the programs are going to check for more complicated conditions. For example: We can insert [program:B] that will perform a sophisticated check to ensure A is running properly; and fail if not. So, A starts. B depends on A so will only start when Supervisor considers A is RUNNING. When B starts, it will perform its analysis and fail if A is not running properly. C only starts when B ran (or is running successfully).

My suggestion does not cover the extra features you suggest, like number of checks, the timing of those checks, and when to give up; but I also advocate giving Supervisor some Cron-like features so programs like B are easy to define:

[program:A]
command=/usr/local/bin/A

[program:B]
command=curl -s http://localhost:54321/v1/_ping
dependson=A
startretries=3
restartintervalsec=5

[program:C]
command=/usr/local/bin/C
dependson=B

Mike Naberezny · Answer 33 · Wed Jul 22 2015 03:29:33 GMT+0800 (China Standard Time)

I also advocate giving Supervisor some Cron-like features

Please see response in #635. This has been considered at length by the Supervisor developers in the past and it was decided that cron-like functionality is out of scope for this project. Sorry.

Ain Tohvri · Answer 34 · Wed Jul 22 2015 16:37:38 GMT+0800 (China Standard Time)

👍 for a way to deliver sequential execution in a feasible fashion.

Using something like startuppriority would be a good robust way to do this IMO.

Vlad Fratila · Answer 35 · Thu Jul 30 2015 22:39:29 GMT+0800 (China Standard Time)

@klahnakoski in your first example, C depending on A means that supervisor needs to know A is ready. Right now, it only knows it's started.

Kyle Lahnakoski · Answer 36 · Thu Jul 30 2015 23:36:09 GMT+0800 (China Standard Time)

@vladfr yes, your are correct. That is why I also suggested B, which is a more sophisticated check to determine if A is ready. Ideally B is triggered periodically, and will fail if A is not ready.

Carlo Pires · Answer 37 · Tue Sep 22 2015 22:14:43 GMT+0800 (China Standard Time)

+1 ?? This is from May, 2012, I think that supervisor will never have dependencies.

Someone could suggest alternatives with dependencies management?

Andrew Leinung · Answer 38 · Thu Oct 01 2015 10:04:39 GMT+0800 (China Standard Time)

+1, this would make things much more elegant

Tristan F. · Answer 39 · Wed Oct 28 2015 13:01:27 GMT+0800 (China Standard Time)

+1, it's sad that this is not doable after 3 years, using systemd units for now

Nikolai Golub · Answer 40 · Tue Nov 10 2015 20:55:03 GMT+0800 (China Standard Time)

 .----------------.  .----------------. 
| .--------------. || .--------------. |
| |      _       | || |     __       | |
| |     | |      | || |    /  |      | |
| |  ___| |___   | || |    `| |      | |
| | |___   ___|  | || |     | |      | |
| |     | |      | || |    _| |_     | |
| |     |_|      | || |   |_____|    | |
| |              | || |              | |
| '--------------' || '--------------' |
 '----------------'  '----------------'

Or please, add it.