Is there a way to delay container startup to support dependant services with a longer startup time

Question

Is there a way to delay container startup to support dependant services with a longer startup time

dancrumb opened this issue 10 years ago · comments

I have a MySQL container that takes a little time to start up as it needs to import data.

I have an Alfresco container that depends upon the MySQL container.

At the moment, when I use fig, the Alfresco service inside the Alfresco container fails when it attempts to connect to the MySQL container... ostensibly because the MySQL service is not yet listening.

Is there a way to handle this kind of issue in Fig?

Richard Kettelerij commented 9 years ago

+1

Camille Troillard commented 9 years ago

+1

Francesco Della Vedova commented 9 years ago

+1

rfink commented 9 years ago

+1

Philippe Vaucher commented 9 years ago

+1

Roy Ginting commented 9 years ago

+1

approxit commented 9 years ago

+1

Jannis Gebauer commented 9 years ago

+1

Paul Rutledge commented 9 years ago

+1

Grant Gochnauer commented 9 years ago

+1

Umed Khudoiberdiev commented 9 years ago

+1

probe commented 9 years ago

+1

cteyton commented 9 years ago

+1

Bogdan Kulbida commented 9 years ago

+1

Bahaaldine commented 9 years ago

+1

Chris Corbyn · Answer 1 · Tue Aug 05 2014 07:35:32 GMT+0800 (China Standard Time)

At work we wrap our dependent services in a script that check if the link is up yet. I know one of my colleagues would be interested in this too! Personally I feel it's a container-level concern to wait for services to be available, but I may be wrong :)

Spencer Rinehart · Answer 2 · Tue Aug 05 2014 07:36:05 GMT+0800 (China Standard Time)

We do the same thing with wrapping. You can see an example here: https://github.com/dominionenterprises/tol-api-php/blob/master/tests/provisioning/set-env.sh

Ben Firshman · Answer 3 · Tue Aug 05 2014 07:43:42 GMT+0800 (China Standard Time)

It'd be handy to have an entrypoint script that loops over all of the links and waits until they're working before starting the command passed to it.

This should be built in to Docker itself, but the solution is a way off. A container shouldn't be considered started until the link it exposes has opened.

Dan Rumney · Answer 4 · Tue Aug 05 2014 07:45:52 GMT+0800 (China Standard Time)

@bfirsh that's more than I was imagining, but would be excellent.

A container shouldn't be considered started until the link it exposes has opened.

I think that's exactly what people need.

For now, I'll be using a variation on https://github.com/aanand/docker-wait

Kevin Littlejohn · Answer 5 · Tue Aug 05 2014 07:49:27 GMT+0800 (China Standard Time)

Yeah, I'd be interested in something like this - meant to post about it earlier.

The smallest impact pattern I can think of that would fix this usecase for us would be to be the following:

Add "wait" as a new key in fig.yml, with similar value semantics as link. Docker would treat this as a pre-requisite and wait until this container has exited prior to carrying on.

So, my docker file would look something like:

db:
  image: tutum/mysql:5.6

initdb:
  build: /path/to/db
  link:
    - db:db
  command: /usr/local/bin/init_db

app:
  link:
    - db:db
  wait:
    - initdb

On running app, it will start up all the link containers, then run the wait container and only progress to the actual app container once the wait container (initdb) has exited. initdb would run a script that waits for the database to be available, then runs any initialisations/migrations/whatever, then exits.

That's my thoughts, anyway.

Daniel Nephin · Answer 6 · Tue Aug 05 2014 13:36:26 GMT+0800 (China Standard Time)

(revised, see below)

Dave Syer · Answer 7 · Thu Aug 14 2014 21:03:48 GMT+0800 (China Standard Time)

+1 here too. It's not very appealing to have to do this in the commands themselves.

JC · Answer 8 · Fri Aug 15 2014 23:54:59 GMT+0800 (China Standard Time)

+1 as well. Just ran into this issue. Great tool btw, makes my life so much easier!

Felipe Arruda Pontes · Answer 9 · Sat Aug 16 2014 19:00:20 GMT+0800 (China Standard Time)

+1 would be great to have this.

James Mills · Answer 10 · Tue Aug 19 2014 21:11:58 GMT+0800 (China Standard Time)

+1 also. Recently run into the same set of problems

chymian · Answer 11 · Tue Aug 19 2014 22:38:21 GMT+0800 (China Standard Time)

+1 also. any statement from dockerguys?

Mark Stuart · Answer 12 · Fri Aug 22 2014 16:34:31 GMT+0800 (China Standard Time)

I am writing wrapper scripts as entrypoints to synchronise at the moment, not sure if having a mechanism in fig is wise if you have other targets for your containers that perform orchestration a different way. Seems very application specific to me, as such the responsibility of the containers doing the work.

James Mills · Answer 13 · Fri Aug 22 2014 16:39:20 GMT+0800 (China Standard Time)

After some thought and experimentation I do kind of agree with this.

As such an application I'm building basically has a synchronous
waitfor(host, port) function that lets me waits for services the application
is depending on (either detected via environment or explicitly
configuration via cli options).

cheers
James

James Mills / prologic

E: prologic@shortcircuit.net.au
W: prologic.shortcircuit.net.au

On Fri, Aug 22, 2014 at 6:34 PM, Mark Stuart notifications@github.com
wrote:

I am writing wrapper scripts as entrypoints to synchronise at the moment,
not sure if having a mechanism in fig is wise if you have other targets for
your containers that perform orchestration a different way. Seems very
application specific to me as such the responsibility of the containers
doing the work.

—
Reply to this email directly or view it on GitHub
#374 (comment).

shuron · Answer 14 · Sun Aug 31 2014 19:52:21 GMT+0800 (China Standard Time)

Yes some basic "depend's on" neeeded here...
so if you have 20 container, you just wan't to run fig up and everything starts with correct order...
However it also have some timeout option or other failure catching mechanisms

Adam Knight · Answer 15 · Fri Oct 24 2014 00:07:32 GMT+0800 (China Standard Time)

Another +1 here. I have Postgres taking longer than Django to start so the DB isn't there for the migration command without hackery.

Daniel Nephin · Answer 16 · Fri Oct 24 2014 00:10:20 GMT+0800 (China Standard Time)

@ahknight interesting, why is migration running during run ?

Don't you want to actually run migrate during the build phase? That way you can startup fresh images much faster.

Adam Knight · Answer 17 · Fri Oct 24 2014 00:15:16 GMT+0800 (China Standard Time)

There's a larger startup script for the application in question, alas. For now, we're doing non-DB work first, using nc -w 1 in a loop to wait for the DB, then doing DB actions. It works, but it makes me feel dirty(er).

Daniel Nephin · Answer 18 · Fri Oct 24 2014 00:20:03 GMT+0800 (China Standard Time)

I've had a lot of success doing this work during the fig build phase. I have one example of this with a django project (still a work in progress through): https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21

No need to poll for startup. Although I've done something similar with mysql, where I did have to poll for startup because the mysqld init script wasn't doing it already. This postgres init script seems to be much better.

Felipe Arruda Pontes · Answer 19 · Fri Oct 24 2014 10:38:43 GMT+0800 (China Standard Time)

Here is what I was thinking:

Using the idea of moby/moby#7445 we could implement this "wait_for_helth_check" attribute in fig?
So it would be a fig not a Docker issue?

is there anyway of making fig check the tcp status on the linked container, if so then I think this is the way to go. =)

Florian Klein · Answer 20 · Mon Nov 10 2014 21:09:37 GMT+0800 (China Standard Time)

@dnephin can you explain a bit more what you're doing in Dockerfiles to help this ?
Isn't the build phase unable to influence the runtime?

Daniel Nephin · Answer 21 · Mon Nov 10 2014 23:25:55 GMT+0800 (China Standard Time)

@docteurklein I can. I fixed the link from above (https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21)

The idea is that you do all the slower "setup" operations during the build, so you don't have to wait for anything during container startup. In the case of a database or search index, you would:

start the service
create the users, databases, tables, and fixture data
shutdown the service

all as a single build step. Later when you fig up the database container it's ready to go basically immediately, and you also get to take advantage of the docker build cache for these slower operations.

Florian Klein · Answer 22 · Mon Nov 10 2014 23:43:48 GMT+0800 (China Standard Time)

nice! thanks :)

Felipe Arruda Pontes · Answer 23 · Tue Nov 11 2014 22:35:25 GMT+0800 (China Standard Time)

@dnephin nice, hadn't thought of that .

Oskar Hane · Answer 24 · Sat Dec 06 2014 05:27:00 GMT+0800 (China Standard Time)

+1 This is definitely needed.
An ugly time delay hack would be enough in most cases, but a real solution would be welcome.

Daniel Nephin · Answer 25 · Sat Dec 06 2014 05:34:06 GMT+0800 (China Standard Time)

Could you give an example of why/when it's needed?

Damon P. Cortesi · Answer 26 · Sat Dec 06 2014 05:37:06 GMT+0800 (China Standard Time)

In the use case I have, I have an Elasticsearch server and then an application server that's connecting to Elasticsearch. Elasticsearch takes a few seconds to spin up, so I can't simply do a fig up -d because the application server will fail immediately when connecting to the Elasticsearch server.

David Dossot · Answer 27 · Sat Dec 06 2014 05:37:08 GMT+0800 (China Standard Time)

Say one container starts MySQL and the other starts an app that needs MySQL and it turns out the other app starts faster. We have transient fig up failures because of that.

Oskar Hane · Answer 28 · Sat Dec 06 2014 05:45:47 GMT+0800 (China Standard Time)

crane has a way around this by letting you create groups that can be started individually. So you can start the MySQL group, wait 5 secs and then start the other stuff that depends on it.
Works in a small scale, but not a real solution.

Felipe Arruda Pontes · Answer 29 · Sat Dec 06 2014 09:28:52 GMT+0800 (China Standard Time)

@oskarhane not sure if this "wait 5 secs" helps, in some cases in might need to wait more (or just can't be sure it won't go over the 5 secs)... it's isn't much safe to rely on time waiting.
Also you would have to manually do this waiting and loading the other group, and that's kind of lame, fig should do that for you =/

Aanand Prasad · Answer 30 · Sat Dec 06 2014 18:56:42 GMT+0800 (China Standard Time)

@oskarhane, @dacort, @ddossot: Keep in mind that, in the real world, things crash and restart, network connections come and go, etc. Whether or not Fig introduces a convenience for waiting on a TCP socket, your containers should be resilient to connection failures. That way they'll work properly everywhere.

David Dossot · Answer 31 · Sun Dec 07 2014 01:01:02 GMT+0800 (China Standard Time)

You are right, but until we fix all pre-existing apps to do things like gracefully recovering from the absence of their critical resources (like DB) on start (which is a Great Thing™ but unfortunately seldom supported by frameworks), we should use fig start to start individual container in a certain order, with delays, instead of fig up.

I can see a shell script coming to control fig to control docker 😉

Paul Garner · Answer 32 · Mon Dec 22 2014 07:57:24 GMT+0800 (China Standard Time)

I am ok with this not being built in to fig but some advice on best practice for waiting on readiness would be good

I saw in some code linked from an earlier comment this was done:

while ! exec 6<>/dev/tcp/${MONGO_1_PORT_27017_TCP_ADDR}/${MONGO_1_PORT_27017_TCP_PORT}; do
    echo "$(date) - still trying to connect to mongo at ${TESTING_MONGO_URL}"
    sleep 1
done

In my case there is no /dev/tcp path though, maybe it's a different linux distro(?) - I'm on Ubuntu

I found instead this method which seems to work ok:

until nc -z postgres 5432; do
    echo "$(date) - waiting for postgres..."
    sleep 1
done

This seems to work but I don't know enough about such things to know if it's robust... does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

I'd be happier if it was possible to invert the check - instead of polling from the dependent containers, is it possible instead to send a signal from the target (ie postgres server) container to all the dependents?

Maybe it's a silly idea, anyone have any thoughts?

Aanand Prasad · Answer 33 · Tue Dec 30 2014 00:10:52 GMT+0800 (China Standard Time)

@anentropic Docker links are one-way, so polling from the downstream container is currently the only way to do it.

does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

There's no way to know in the general case - it might be true for postgres, it might be false for other services - which is another argument for not doing it in Fig.

mindnuts · Answer 34 · Thu Jan 08 2015 14:58:35 GMT+0800 (China Standard Time)

@aanand I tried using your docker/wait image approach but i am not sure what is happening. So basically i have this "Orientdb" container which lot of other NodeJS app containers link to. This orientdb container takes some amount of time to start listening on the TCP port and this makes the other containers to get "Connection Refused" error.

I hoped that by linking wait container to Orientdb i will not see this error. But unfortunately i am still getting it randomly. Here is my setup (Docker version 1.4.1, fig 1.0.1 on an Ubuntu 14.04 Box):

orientdb:
    build: ./Docker/orientdb
    ports:
        -   "2424:2424"
        -   "2480:2480"
wait:
    build: ./Docker/wait
    links:
        - orientdb:orientdb
....
core:
    build:  ./Docker/core
    ports:
        -   "3000:3000"
    links:
        -   orientdb:orientdb
        -   nsqd:nsqd

Any help is appreciated. Thanks.

Aanand Prasad · Answer 35 · Fri Jan 09 2015 01:12:55 GMT+0800 (China Standard Time)

@mindnuts the wait image is more of a demonstration; it's not suitable for use in a fig.yml. You should use the same technique (repeated polling) in your core container to wait for the orientdb container to start before kicking off the main process.

Marcus Morris · Answer 36 · Sat Jan 17 2015 04:31:07 GMT+0800 (China Standard Time)

+1 just started running into this as I am pulling custom built images vs building them in the fig.yml. Node app failing because mongodb is not ready yet...

Kenneth Falck · Answer 37 · Sun Jan 18 2015 05:16:18 GMT+0800 (China Standard Time)

I just spent hours debugging why MySQL was reachable when starting WordPress manually with Docker, and why it was offline when starting with Fig. Only now I realized that Fig always restarts the MySQL container whenever I start the application, so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I added my own overridden entrypoint.sh that waits for 5 seconds before executing the real entrypoint.sh. But clearly this is a use case that needs a general solution, if it's supposed to be easy to launch a MySQL+WordPress container combination with Docker/Fig.

Daniel Nephin · Answer 38 · Mon Jan 19 2015 07:57:12 GMT+0800 (China Standard Time)

so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I think this is an issue with the WordPress container.

While I was initially a fan of this idea, after reading moby/moby#7445 (comment), I think such a feature would be the wrong approach, and actually encourages bad practices.

There seem to be two cases which this issue aims to address:

A dependency service needs to be available to perform some initialization.

Any container initialization should really be done during build. That way it is cached, and the work doesn't need to be repeated by every user of the image.

A dependency service needs to be available so that a connection can be opened

The application should really be resilient to connection failures and retry the connection.

Kenneth Falck · Answer 39 · Mon Jan 19 2015 08:26:07 GMT+0800 (China Standard Time)

I suppose the root of the problem is that there are no ground rules as to whose responsibility it is to wait for services to become ready. But even if there were, I think it's a bit unrealistic to expect that developers would add database connection retrying to every single initialization script. Such scripts are often needed to prepare empty data volumes that have just been mounted (e.g. create the database).

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

Aanand Prasad · Answer 40 · Mon Jan 19 2015 20:49:16 GMT+0800 (China Standard Time)

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

Actually it doesn't just restart containers, it destroys and recreates them, because it's the simplest way to make sure changes to fig.yml are picked up. We should eventually implement a smarter solution that can compare "current config" with "desired config" and only recreate what has changed.

Getting back to the original issue, I really don't think it's unrealistic to expect containers to have connection retry logic - it's fundamental to designing a distributed system that works. If different scripts need to share it, it should be factored out into an executable (or language-specific module if you're not using shell), so each script can just invoke waitfor db at the top.

Florian Klein · Answer 41 · Mon Jan 19 2015 21:01:08 GMT+0800 (China Standard Time)

@kennu what about --no-recreate ? /cc @aanand

Kenneth Falck · Answer 42 · Mon Jan 19 2015 21:25:06 GMT+0800 (China Standard Time)

@aanand I meant the unrealism comment from the point of view the Docker Hub is already full of published images that probably don't handle connection retrying in their initialization scripts, and that it would be quite an undertaking to get everybody to add it. But I guess it could be done if Docker Inc published some kind of official guidelines / requirements.

Personally I'd rather keep containers/images simple though and let the underlying system worry about resolving dependencies. In fact, Docker's restart policy might already solve everything (if the application container fails to connect to the database, it will restart and try again until the database is available).

But relying on the restart policy means that it should be enabled by default, or otherwise people spend hours debugging the problem (like I just did). E.g. Kubernetes defaults to RestartPolicyAlways for pods.

Marcus Morris · Answer 43 · Sat Jan 24 2015 04:16:38 GMT+0800 (China Standard Time)

any progress on this? I would like to echo that expecting all docker images to change and the entire community implement connection retry practices is not reasonable. Fig is a Docker orchestration tool and the problem lies in the order it does things so the change needs to be made in Fig, not Docker or the community.

Daniel Nephin · Answer 44 · Sun Jan 25 2015 07:11:39 GMT+0800 (China Standard Time)

expecting all docker images to change and the entire community implement connection retry practices is not reasonable

It's not that an application should need to retry because of docker or fig. Applications should be resilient to dropped connections because the network is not reliable. Any application should already be built this way.

I personally haven't had to implement retries in any of my containers, and I also haven't needed any delay or waiting on startup. I believe most cases of this problem fall into these two categories (my use of "retry" is probably not great here, I meant more that it would re-establish a connection if the connection was closed, not necessarily poll for some period attempting multiple times).

If you make sure that all initialization happens during the "build" phase, and that connections are re-established on the next request you won't need to retry (or wait on other containers to start). If connections are opened lazily (when the first request is made), instead of eagerly (during startup), I suspect you won't need to retry at all.

the problem lies in the order [fig] does things

I don't see any mention of that in this discussion so far. Fig orders startup based on the links specified in the config, so it should always start containers in the right order. Can you provide a test case where the order is incorrect?

Sebastiaan van Stijn · Answer 45 · Sun Jan 25 2015 07:24:50 GMT+0800 (China Standard Time)

I have to agree with @dnephin here. Sure, it would be convenient if compose/fig was able to do some magic and check availability of services, however, what would the expected behavior be if a service doesn't respond? That really depends on the requirements of your application/stack. In some cases, the entire stack should be destroyed and replaced with a new one, in other cases a failover stack should be used. Many other scenarios can be thought of.

Compose/Fig cannot make these decisions, and monitoring services should be the responsibility of the applications running inside the container.

Kenneth Falck · Answer 46 · Sun Jan 25 2015 08:06:16 GMT+0800 (China Standard Time)

I would like to suggest that @dnephin has merely been lucky. If you fork two processes in parallel, one of which will connect to a port that the other will listen to, you are essentially introducing a race condition; a lottery to see which process happens to initialize faster.

I would also like to repeat the WordPress initialization example: It runs a startup shell script that creates a new database if the MySQL container doesn't yet have it (this can't be done when building the Docker image, since it's dependent on the externally mounted data volume). Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors and implement some sane retry logic within the shell script. I consider it highly likely that the author of the image will never actually test the startup script against the said race condition.

Still, Docker's built-in restart policy provides a workaround for this, if you're ready to accept that containers sporadically fail to start and regularly print errors in logs. (And if you remember to turn it on.)

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

Sebastiaan van Stijn · Answer 47 · Sun Jan 25 2015 16:13:06 GMT+0800 (China Standard Time)

this can't be done when building the Docker image, since it's dependent on the externally mounted data volume

True. An approach here is to start just the database container once (if needed, with a different entrypoint/command), to initialise the database, or use a data-only container for the database, created from the same image as the database container itself.

Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors

Compose/Fig will run into the same issue there; How to check if MySQL is up, and accepting connections? (and PostgreSQL, and (insert your service here)). Also, where should the "ping" be executed from? Inside the container you're starting, from the host?

As far as I can tell, the official WordPress image includes a check to see if MySQL is accepting connections in the docker-entrypoint.sh

Kenneth Falck · Answer 48 · Sun Jan 25 2015 16:33:00 GMT+0800 (China Standard Time)

@thaJeztah "Add some simple retry logic in PHP for MySQL connection errors" authored by tianon 2 days ago - Nice. :-) Who knows, maybe this will become a standard approach after all, but I still have my doubts, especially about this kind of retry implementations actually having being tested by all image authors.

About the port pinging - I can't say offhand what the optimal implementation would be. I guess maybe simple connection checking from a temporary linked container and retrying while getting ECONNREFUSED. Whatever solves 80% (or possibly 99%) of the problems, so users don't have to solve them by themselves again and again every time.

Sebastiaan van Stijn · Answer 49 · Sun Jan 25 2015 16:45:03 GMT+0800 (China Standard Time)

@kennu Ah! Thanks, wasn't aware it was just added recently, just checked the script now because of this discussion.

To be clear, I understand the problems you're having, but I'm not sure Compose/Fig would be able to solve them in a clean way that works for everyone (and reliably). I understand many images on the registry don't have "safeguards" in place to handle these issues, but I doubt it's Compose/Fig's responsibility to fix that.

Sebastiaan van Stijn · Answer 50 · Sun Jan 25 2015 16:49:49 GMT+0800 (China Standard Time)

Having said the above; I do think it would be a good thing to document this in the Dockerfile best practices section.

People should be made aware of this and some examples should be added to illustrate how to handle service "outage". Including a link to the WikiPedia article that @dnephin mentioned (and possibly other sources) for reference.

Felix · Answer 51 · Sun Feb 08 2015 09:17:58 GMT+0800 (China Standard Time)

I ran into the same problem and like this idea from @kennu

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

I think this would solve a lot typical use cases, like for me when depending on the official mongodb container.

Marcus Morris · Answer 52 · Sun Feb 08 2015 10:03:32 GMT+0800 (China Standard Time)

I agree with @soupdiver. I am also having trouble in conjunction with a mongo container, and although I have it working with a start.sh script, the script is not very dynamic and adds another file I need to keep in my repo (I would like to just have a Dockerfile and docker-compose.yml in my node repo). It would be nice if there were some way to just Make It Work, but I think something simple like a wait timer won't cut it in most cases.

Tobias Munk · Answer 53 · Tue Feb 10 2015 00:19:53 GMT+0800 (China Standard Time)

IMO pinging is not enough, because the basic network connection may be available, but the service itself is still not ready.
This is the case with the MySQL image for example, using curl or telnet for the connection check on the exposed ports would be safer, although I don't know if it would be enough. But most containers don't have these tools installed by default.

Could docker or fig handle these checks?

Sebastiaan van Stijn · Answer 54 · Tue Feb 10 2015 01:03:39 GMT+0800 (China Standard Time)

Could docker or fig handle these checks?

In short: no. For various reasons;

Performing a "ping" from within a container would mean running a second process. Fig/Compose cannot automatically start such process, and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.
(As I mentioned in a previous comment), each service requires different way to check if it is accepting connections / ready for use. Some services may need credentials or certificates to establish a connection. Fig/Compose cannot automatically invent how to do that.

Tobias Munk · Answer 55 · Tue Feb 10 2015 01:46:58 GMT+0800 (China Standard Time)

and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.

No, for sure not.

Fig/Compose cannot automatically invent how to do that.

Not invent. I was thinking more about an instruction for fig or docker, how to check it, eg.

web:
    image: nginx
    link: db
db:
   is_available: "curl DB_TCP_ADDR:DB_TCP_PORT"

The telnet command would be executed on the docker-host, not in the container.
But I am just thinking loud, I know that this is not the perfect solution. But the current way of using custom check-scripts for the containers could be improved.

Sebastiaan van Stijn · Answer 56 · Tue Feb 10 2015 02:01:20 GMT+0800 (China Standard Time)

The telnet command would be executed on the docker-host, not in the container.

Then curl or <name a tool that's needed> would have to be installed on the host. This could even have huge security issues (e.g. someone wants to be funny and uses is_available: "rm -rf /"). Apart from that, being able to access the database from the host is no guarantee that it's also accessible from inside the container.

But I am just thinking loud, ...

I know, and I appreciate it. Just think there's no reliable way to automate this, or would serve most use-cases. In many cases you'd end up with something complex (take, for example, the curl example; how long should it try to connect? Retry?). Such complexity is better to move inside the container, which would also be useful if the container was started with Docker, not Fig/Compose.

Tobias Munk · Answer 57 · Tue Feb 10 2015 02:28:42 GMT+0800 (China Standard Time)

@thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

Kevin Littlejohn · Answer 58 · Tue Feb 10 2015 07:02:50 GMT+0800 (China Standard Time)

I’m going to repeat a suggestion I made earlier: It would be sufficient for me if I could state in the fig.yml “wait for this container to exit before running this other container”.

This would allow me to craft a container that knows how to wait for all it’s dependencies - check ports, initialise databases, whatever - and would require fig know as little as possible.

I would see it configured as something like:

“”"
app:
links:
- db:db
prereqs:
- runthisfirst

runthisfirst:
links:
- db:db
“””

runthisfirst has a link that means the database starts up so it can check access. app will only run once runthisfirst has exited (bonus points if runthisfirst has to exit successfully).

Is this feasible as an answer?

KJL

On 10 Feb 2015, at 05:28, Tobias Munk notifications@github.com wrote:

@thaJeztah https://github.com/thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

—
Reply to this email directly or view it on GitHub #374 (comment).

Joey Geiger · Answer 59 · Sat Feb 28 2015 07:06:17 GMT+0800 (China Standard Time)

I've just tried migrating my shell script launchers and ran into this issue. It would be nice even just to add a simple sleep/wait key that just sleeps for that number of seconds before launching the next container.

db:
  image: tutum/mysql:5.6
  sleep: 10
app:
  link:
    - db:db

James Mills · Answer 60 · Sat Feb 28 2015 09:44:06 GMT+0800 (China Standard Time)

I really dno't like this for a number of reasons.

a) I think it's the wrong place for this
b) How long do you sleep for?
c) What if the timeout is not long enougH?

Aside from the obvious issues I really don't think
infrastructure should care about what the application
is and vice versa. IHMO the app should be written to be
more tolerant and/or smarter about it's own requirements.

That being said existing applications and legacy applications
will need something -- But it should probably be more along
the lines of:

a docker-compose.yml:

db:
  image: tutum/mysql:5.6
app:
  wait: db
  link:
    - db:db

Where wait waits for "exposed" services on db to become available.

The problem is how do you determine that?

In the simplest cases you wait until you can successfully open
a tcp or udp connection to the exposed services.

Matt Wallington · Answer 61 · Wed Mar 11 2015 03:49:47 GMT+0800 (China Standard Time)

This might be overkill for this problem but what would be a nice solution is if docker provided an event triggering system where you could initiate a trigger from one container that resulted in some sort of callback in another container. In the case of waiting on importing data into a MySQL database before starting another service, just monitoring whether the port was available isn't enough.

Having an entrypoint script set an alert to Docker from inside the container (set a pre-definied environment variable for example) that triggered an event in another container (perhaps setting the same synchronized environment variable) would enable scripts on both sides to know when certain tasks are complete.

Of course we could set up our own socket server or other means but that's tedious to solve a container orchestration issue.

Neil Chambers · Answer 62 · Thu Mar 12 2015 00:04:21 GMT+0800 (China Standard Time)

@aanand I almost have something working using your wait approach as the starting point. However, there is something else happening between docker-compose run and docker run where the former appears to hang whilst the later works a charm.

example docker-compose.yml:

db:
  image: postgres
  ports:
    - "5432"
es:
  image: dockerfile/elasticsearch
  ports:
    - "9200"
wait:
  image: n3llyb0y/wait
  environment:
    PORTS: "5432 9200"
  links:
    - es
    - db

then using...

docker-compose run wait

however this is not to be. The linked services start and it looks like we are about to wait only for it to choke (at least within my virtualbox env. I get to the nc loop and we get a single dot then...nothing).

However, with the linked services running I can use this method (which is essentially what I have been doing for our CI builds)

docker run -e PORTS="5432 9200" --links service_db_1:wait1 --links service_es_1:wait2 n3llyb0y/wait

It feels like docker-compose run should work in the same way. The difference is that when using docker-compose run with the detach flag -d you get no wait benefit as the wait container backgrounds and I think (at this moment in time) that not using the flag causes the wait to choke on the other non-backgrounded services. I am going to take a closer look

Neil Chambers · Answer 63 · Fri Mar 13 2015 19:44:30 GMT+0800 (China Standard Time)

After a bit of trial and error it seems the above approach does work! It's just the busybox base doesn't have a netcat util that works very well. My modified version of @aanand wait utility does work against docker-compose 1.1.0 when using docker-compose run <util label> instead of docker-compose up. Example of usage in the link.

Not sure if it can handle chaining situations as per the original question though. Probably not.

Let me know what you think.

Adrian Hurtado · Answer 64 · Mon Mar 16 2015 09:29:55 GMT+0800 (China Standard Time)

This is a very interesting issue. I think it would be really interesting to have a way that one container waits until another one it's ready. But as everybody says, what does ready mean? In my case I have a container for MySQL, another one that manage its backups and is also in charge of import an initial database, and then the containers for each app that need the database. It's obvious that to wait the ports to be exposed is not enough. First the mysql container must be started and then the rest should wait until the mysql service is ready to use, not before. To get that, I have needed to implement a simple script to be executed on reboot that uses the docker exec functionality. Basically, the pseudo-code would be like:

run mysql
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"show tables\""
run mysql-backup
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"describe my_table\""
run web1
waitUntil "dexec web1 curl localhost:9000 | grep '<h1>Home</h1>'"
run web2
waitUntil "dexec web2 curl localhost:9000 | grep '<h1>Home</h1>'"
run nginx

Where waitUntil function has a loop with a timeout that evals the docker exec … command and check if the exit code is 0.

With that I assure that every container waits until its dependencies are ready to use.

So I think it could be an option to integrate within compose utility. Maybe something like that, where wait_until declares a list of other dependencies (containers) and waits for each one until they respond ok to the corresponding command (or maybe with an optional pattern or regex to check if the result matches to something you expect, even though using grep command could be enough).

mysql:
  image: mysql
  ...
mysql-backup:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "show tables"
  ...
web1:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
web2:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
nginx:
  links:
   - web1
   - web2
  wait_until:
   - web1: curl localhost:9000 | grep '<h1>Home</h1>'
   - web2: curl localhost:9000 | grep '<h1>Home</h1>'
  ...

Robson Roberto Souza Peixoto · Answer 65 · Wed Apr 01 2015 04:23:01 GMT+0800 (China Standard Time)

Wha not a simple eait for the port like it?
http://docs.azk.io/en/azkfilejs/wait.html#

Matt Wallington · Answer 66 · Thu Apr 02 2015 07:21:26 GMT+0800 (China Standard Time)

@robsonpeixoto: Waiting for the port isn't sufficient for a lot of use cases. For example, let's say you are seeding a database with data on creation and don't want the web server to start and connect to it until the data operation has completed. The port will be open the whole time so that wouldn't block the web server from starting.

Matt Wallington · Answer 67 · Tue Apr 07 2015 02:32:52 GMT+0800 (China Standard Time)

Something like AWS CloudFormation's WaitCondition would be nice. http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-waitcondition.html

Dean Perry · Answer 68 · Thu Apr 16 2015 06:18:00 GMT+0800 (China Standard Time)

+1 I'm having the same issue when using Docker for testing my Rails apps which depend on MySQL

James D. Morris · Answer 69 · Tue Apr 21 2015 13:40:35 GMT+0800 (China Standard Time)

+1 I have this issue too. I like @adrianhurt idea, where you actually supply the condition to be evaluated to determine if the wait is complete. That way you still have a nice declarative yml, and you don't have to have an arbitrary definition of "ready".

Paul Garner · Answer 70 · Thu Apr 23 2015 16:29:41 GMT+0800 (China Standard Time)

I've had this tab open for a while: http://crosbymichael.com/docker-events.html ...seems relevant

Konstantin Kamkou · Answer 71 · Tue May 05 2015 03:55:46 GMT+0800 (China Standard Time)

+1 for simple timeout

ryneeverett · Answer 72 · Tue May 05 2015 04:13:45 GMT+0800 (China Standard Time)

+1 for a ready condition

Tobias Munk · Answer 73 · Tue May 19 2015 22:36:48 GMT+0800 (China Standard Time)

I am solving this very reliably on the application level since a while, like it was recommended in this thread.

Just to give you an idea how this can be implemented for MySQL + PHP here's my code.

From igorw/retry :)

Since the network is reliable, things should always work. Am I right? For those cases when they don't, there is retry.

Aanand Prasad · Answer 74 · Wed May 20 2015 22:51:12 GMT+0800 (China Standard Time)

@schmunk42 Nice stuff - I like that it's a good example of both establishing the connection and performing an idempotent database setup operation.

Sebastiaan van Stijn · Answer 75 · Thu May 21 2015 00:01:52 GMT+0800 (China Standard Time)

Might be good to create a (some) basic example(s) for inclusion in the docs, for different cases, e.g. NodeJS, Ruby, PHP.

Baohua Yang · Answer 76 · Thu May 21 2015 16:07:25 GMT+0800 (China Standard Time)

+1, at least should provide some options to add some delay before the container starts successfully.

Robson Roberto Souza Peixoto · Answer 77 · Thu Jun 04 2015 00:22:26 GMT+0800 (China Standard Time)

How to solve problems when you try to connect services that aren't your code.
For example, if a have the service Service and the database InfluxDB. Services requires InfluxDB and the InfluxDB has a slow startup.

How can docker-compose wait for InfluxDB be ready?

I the code is my, I can solve it putting a retry. But for third app I can't changes the code.

Artem Sidorenko · Answer 78 · Thu Jun 04 2015 00:26:07 GMT+0800 (China Standard Time)

@robsonpeixoto there are some examples in this ticket with netcat or similar ways. You can take a look to my MySQL example in another ticket: moby/moby#7445 (comment)

Adrian Hurtado · Answer 79 · Thu Jun 04 2015 00:48:31 GMT+0800 (China Standard Time)

That's the reason I think each container should have the optional ability to indicate its own readiness. For a DB, for example, I want to wait until the service is completely ready, not when the process is created. I solve this with customized checks with docker exec and checking if it can solve a simple query, for example.

Some optional flag for docker run to indicate an internal check command would be great to later link it from another container using a special flag for the link.

Something like:

$ sudo docker run -d --name db training/postgres --readiness-check /bin/sh -c "is_ready.sh"
$ sudo docker run -d -P --name web --link db:db --wait-for-readiness db training/webapp python app.py

Where is_ready.sh is a simple boolean test which is in charge of the decision of when the container is considered as ready.

Zhe Li · Answer 80 · Fri Jun 26 2015 21:49:28 GMT+0800 (China Standard Time)

@schmunk42 nice quote!

Volodymyr Vitvitskyi · Answer 81 · Sun Jul 05 2015 19:42:59 GMT+0800 (China Standard Time)

+1 and one of the solutions using own script

Philippe Vaucher · Answer 82 · Thu Jul 16 2015 19:38:56 GMT+0800 (China Standard Time)

Actually I changed my mind about this, so -1

It makes more sense for your container to check wether the 3rd party service is available, and this is easily done with a little bash wrapper script that uses nc for example.

Relying on a delay is tempting, but it's a poor solution because:

Your container will always wait X seconds before being ready.
X seconds might still not be enough in some cases (e.g heavy I/O or CPU on the host), so your container is still not failsafe.
No fail strategy.

Relying on writing a wrapper bash script is better because:

Your container is ready as soon as it possibly can.
You can implement any fail strategy, e.g try 10 times then fail, try forever, etc. You can even implement the delay yourself by sleeping before trying!

Nicholas Herring · Answer 83 · Wed Jul 29 2015 17:07:03 GMT+0800 (China Standard Time)

Reading through this threadnought I see that no one mentions secrets. I'm trying to use a data-only container which requests secrets once it is run. The problem I have: If my secrets take too long to transmit/decrypt then my dependant container fails because the data it's expecting isn't there. I can't really use the method of "well put everything in the container before you run it" because they are secrets.

I know there is some ambiguity around data-only containers in compose due to the context of the return code, but is there a better way to do this?

Anton Gilgur · Answer 84 · Thu Jul 30 2015 03:02:50 GMT+0800 (China Standard Time)

Similarly changing my mind on this, -1. @dnephin's approach is entirely correct. If your application depends on a service, the application itself should be able to handle unavailability of that service gracefully (e.g. re-establishing a connection). It shouldn't be some bash wrapper script or some logic in Compose or Docker, it's the responsibility of the application itself. Anything not at the application level will also only work on initialization; if that service goes down, a wrapper script or something will not be executed.

Now if we could get application/library/framework developers to realize and support this responsibility, that would be fantastic.

Matt Wallington · Answer 85 · Thu Jul 30 2015 03:28:09 GMT+0800 (China Standard Time)

Tough to do considering the approaches you would take involve sideloading other daemons which isn't recommended. For my example where I have a rails app attempting to connect to a MySQL database while another rails app is currently migrating and seeding the DB on initial startup, for me to make the rails app know to not attempt to use the DB, I would either have to mod the ActiveRecord library (not going to happen) or run a script that keeps checking to see if the DB has been migrated and seeded. But how do I know for sure without knowing what data is supposed to be in there and/or having some script that runs on the system that's seeding the DB to inform the rest to connect to it.

Maybe i'm missing the obvious solution but your answer of "developers should be able to deal with this in their own code" breaks down when you're using off the shelf libraries and when you're not "supposed" to sideload daemons into a container.