docker / compose

Define and run multi-container applications with Docker

Home Page:https://docs.docker.com/compose/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a way to delay container startup to support dependant services with a longer startup time

dancrumb opened this issue · comments

I have a MySQL container that takes a little time to start up as it needs to import data.

I have an Alfresco container that depends upon the MySQL container.

At the moment, when I use fig, the Alfresco service inside the Alfresco container fails when it attempts to connect to the MySQL container... ostensibly because the MySQL service is not yet listening.

Is there a way to handle this kind of issue in Fig?

At work we wrap our dependent services in a script that check if the link is up yet. I know one of my colleagues would be interested in this too! Personally I feel it's a container-level concern to wait for services to be available, but I may be wrong :)

It'd be handy to have an entrypoint script that loops over all of the links and waits until they're working before starting the command passed to it.

This should be built in to Docker itself, but the solution is a way off. A container shouldn't be considered started until the link it exposes has opened.

@bfirsh that's more than I was imagining, but would be excellent.

A container shouldn't be considered started until the link it exposes has opened.

I think that's exactly what people need.

For now, I'll be using a variation on https://github.com/aanand/docker-wait

Yeah, I'd be interested in something like this - meant to post about it earlier.

The smallest impact pattern I can think of that would fix this usecase for us would be to be the following:

Add "wait" as a new key in fig.yml, with similar value semantics as link. Docker would treat this as a pre-requisite and wait until this container has exited prior to carrying on.

So, my docker file would look something like:

db:
  image: tutum/mysql:5.6

initdb:
  build: /path/to/db
  link:
    - db:db
  command: /usr/local/bin/init_db

app:
  link:
    - db:db
  wait:
    - initdb

On running app, it will start up all the link containers, then run the wait container and only progress to the actual app container once the wait container (initdb) has exited. initdb would run a script that waits for the database to be available, then runs any initialisations/migrations/whatever, then exits.

That's my thoughts, anyway.

(revised, see below)

+1 here too. It's not very appealing to have to do this in the commands themselves.

commented

+1 as well. Just ran into this issue. Great tool btw, makes my life so much easier!

+1 would be great to have this.

+1 also. Recently run into the same set of problems

+1 also. any statement from dockerguys?

I am writing wrapper scripts as entrypoints to synchronise at the moment, not sure if having a mechanism in fig is wise if you have other targets for your containers that perform orchestration a different way. Seems very application specific to me, as such the responsibility of the containers doing the work.

After some thought and experimentation I do kind of agree with this.

As such an application I'm building basically has a synchronous
waitfor(host, port) function that lets me waits for services the application
is depending on (either detected via environment or explicitly
configuration via cli options).

cheers
James

James Mills / prologic

E: prologic@shortcircuit.net.au
W: prologic.shortcircuit.net.au

On Fri, Aug 22, 2014 at 6:34 PM, Mark Stuart notifications@github.com
wrote:

I am writing wrapper scripts as entrypoints to synchronise at the moment,
not sure if having a mechanism in fig is wise if you have other targets for
your containers that perform orchestration a different way. Seems very
application specific to me as such the responsibility of the containers
doing the work.


Reply to this email directly or view it on GitHub
#374 (comment).

Yes some basic "depend's on" neeeded here...
so if you have 20 container, you just wan't to run fig up and everything starts with correct order...
However it also have some timeout option or other failure catching mechanisms

Another +1 here. I have Postgres taking longer than Django to start so the DB isn't there for the migration command without hackery.

@ahknight interesting, why is migration running during run ?

Don't you want to actually run migrate during the build phase? That way you can startup fresh images much faster.

There's a larger startup script for the application in question, alas. For now, we're doing non-DB work first, using nc -w 1 in a loop to wait for the DB, then doing DB actions. It works, but it makes me feel dirty(er).

I've had a lot of success doing this work during the fig build phase. I have one example of this with a django project (still a work in progress through): https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21

No need to poll for startup. Although I've done something similar with mysql, where I did have to poll for startup because the mysqld init script wasn't doing it already. This postgres init script seems to be much better.

Here is what I was thinking:

Using the idea of moby/moby#7445 we could implement this "wait_for_helth_check" attribute in fig?
So it would be a fig not a Docker issue?

is there anyway of making fig check the tcp status on the linked container, if so then I think this is the way to go. =)

@dnephin can you explain a bit more what you're doing in Dockerfiles to help this ?
Isn't the build phase unable to influence the runtime?

@docteurklein I can. I fixed the link from above (https://github.com/dnephin/readthedocs.org/blob/fig-demo/dockerfiles/database/Dockerfile#L21)

The idea is that you do all the slower "setup" operations during the build, so you don't have to wait for anything during container startup. In the case of a database or search index, you would:

  1. start the service
  2. create the users, databases, tables, and fixture data
  3. shutdown the service

all as a single build step. Later when you fig up the database container it's ready to go basically immediately, and you also get to take advantage of the docker build cache for these slower operations.

nice! thanks :)

@dnephin nice, hadn't thought of that .

+1 This is definitely needed.
An ugly time delay hack would be enough in most cases, but a real solution would be welcome.

Could you give an example of why/when it's needed?

In the use case I have, I have an Elasticsearch server and then an application server that's connecting to Elasticsearch. Elasticsearch takes a few seconds to spin up, so I can't simply do a fig up -d because the application server will fail immediately when connecting to the Elasticsearch server.

Say one container starts MySQL and the other starts an app that needs MySQL and it turns out the other app starts faster. We have transient fig up failures because of that.

crane has a way around this by letting you create groups that can be started individually. So you can start the MySQL group, wait 5 secs and then start the other stuff that depends on it.
Works in a small scale, but not a real solution.

@oskarhane not sure if this "wait 5 secs" helps, in some cases in might need to wait more (or just can't be sure it won't go over the 5 secs)... it's isn't much safe to rely on time waiting.
Also you would have to manually do this waiting and loading the other group, and that's kind of lame, fig should do that for you =/

@oskarhane, @dacort, @ddossot: Keep in mind that, in the real world, things crash and restart, network connections come and go, etc. Whether or not Fig introduces a convenience for waiting on a TCP socket, your containers should be resilient to connection failures. That way they'll work properly everywhere.

You are right, but until we fix all pre-existing apps to do things like gracefully recovering from the absence of their critical resources (like DB) on start (which is a Great Thing™ but unfortunately seldom supported by frameworks), we should use fig start to start individual container in a certain order, with delays, instead of fig up.

I can see a shell script coming to control fig to control docker 😉

I am ok with this not being built in to fig but some advice on best practice for waiting on readiness would be good

I saw in some code linked from an earlier comment this was done:

while ! exec 6<>/dev/tcp/${MONGO_1_PORT_27017_TCP_ADDR}/${MONGO_1_PORT_27017_TCP_PORT}; do
    echo "$(date) - still trying to connect to mongo at ${TESTING_MONGO_URL}"
    sleep 1
done

In my case there is no /dev/tcp path though, maybe it's a different linux distro(?) - I'm on Ubuntu

I found instead this method which seems to work ok:

until nc -z postgres 5432; do
    echo "$(date) - waiting for postgres..."
    sleep 1
done

This seems to work but I don't know enough about such things to know if it's robust... does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

I'd be happier if it was possible to invert the check - instead of polling from the dependent containers, is it possible instead to send a signal from the target (ie postgres server) container to all the dependents?

Maybe it's a silly idea, anyone have any thoughts?

@anentropic Docker links are one-way, so polling from the downstream container is currently the only way to do it.

does anyone know if there's there a possible race condition between port showing up to nc and postgres server really able to accept commands?

There's no way to know in the general case - it might be true for postgres, it might be false for other services - which is another argument for not doing it in Fig.

@aanand I tried using your docker/wait image approach but i am not sure what is happening. So basically i have this "Orientdb" container which lot of other NodeJS app containers link to. This orientdb container takes some amount of time to start listening on the TCP port and this makes the other containers to get "Connection Refused" error.

I hoped that by linking wait container to Orientdb i will not see this error. But unfortunately i am still getting it randomly. Here is my setup (Docker version 1.4.1, fig 1.0.1 on an Ubuntu 14.04 Box):

orientdb:
    build: ./Docker/orientdb
    ports:
        -   "2424:2424"
        -   "2480:2480"
wait:
    build: ./Docker/wait
    links:
        - orientdb:orientdb
....
core:
    build:  ./Docker/core
    ports:
        -   "3000:3000"
    links:
        -   orientdb:orientdb
        -   nsqd:nsqd

Any help is appreciated. Thanks.

@mindnuts the wait image is more of a demonstration; it's not suitable for use in a fig.yml. You should use the same technique (repeated polling) in your core container to wait for the orientdb container to start before kicking off the main process.

+1 just started running into this as I am pulling custom built images vs building them in the fig.yml. Node app failing because mongodb is not ready yet...

I just spent hours debugging why MySQL was reachable when starting WordPress manually with Docker, and why it was offline when starting with Fig. Only now I realized that Fig always restarts the MySQL container whenever I start the application, so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I added my own overridden entrypoint.sh that waits for 5 seconds before executing the real entrypoint.sh. But clearly this is a use case that needs a general solution, if it's supposed to be easy to launch a MySQL+WordPress container combination with Docker/Fig.

so the WordPress entrypoint.sh dies not yet being able to connect to MySQL.

I think this is an issue with the WordPress container.

While I was initially a fan of this idea, after reading moby/moby#7445 (comment), I think such a feature would be the wrong approach, and actually encourages bad practices.

There seem to be two cases which this issue aims to address:

A dependency service needs to be available to perform some initialization.

Any container initialization should really be done during build. That way it is cached, and the work doesn't need to be repeated by every user of the image.

A dependency service needs to be available so that a connection can be opened

The application should really be resilient to connection failures and retry the connection.

I suppose the root of the problem is that there are no ground rules as to whose responsibility it is to wait for services to become ready. But even if there were, I think it's a bit unrealistic to expect that developers would add database connection retrying to every single initialization script. Such scripts are often needed to prepare empty data volumes that have just been mounted (e.g. create the database).

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

The problem would actually be much less obtrusive if Fig didn't always restart linked containers (i.e. the database server) when restarting the application container. I don't really know why it does that.

Actually it doesn't just restart containers, it destroys and recreates them, because it's the simplest way to make sure changes to fig.yml are picked up. We should eventually implement a smarter solution that can compare "current config" with "desired config" and only recreate what has changed.

Getting back to the original issue, I really don't think it's unrealistic to expect containers to have connection retry logic - it's fundamental to designing a distributed system that works. If different scripts need to share it, it should be factored out into an executable (or language-specific module if you're not using shell), so each script can just invoke waitfor db at the top.

@kennu what about --no-recreate ? /cc @aanand

@aanand I meant the unrealism comment from the point of view the Docker Hub is already full of published images that probably don't handle connection retrying in their initialization scripts, and that it would be quite an undertaking to get everybody to add it. But I guess it could be done if Docker Inc published some kind of official guidelines / requirements.

Personally I'd rather keep containers/images simple though and let the underlying system worry about resolving dependencies. In fact, Docker's restart policy might already solve everything (if the application container fails to connect to the database, it will restart and try again until the database is available).

But relying on the restart policy means that it should be enabled by default, or otherwise people spend hours debugging the problem (like I just did). E.g. Kubernetes defaults to RestartPolicyAlways for pods.

any progress on this? I would like to echo that expecting all docker images to change and the entire community implement connection retry practices is not reasonable. Fig is a Docker orchestration tool and the problem lies in the order it does things so the change needs to be made in Fig, not Docker or the community.

expecting all docker images to change and the entire community implement connection retry practices is not reasonable

It's not that an application should need to retry because of docker or fig. Applications should be resilient to dropped connections because the network is not reliable. Any application should already be built this way.

I personally haven't had to implement retries in any of my containers, and I also haven't needed any delay or waiting on startup. I believe most cases of this problem fall into these two categories (my use of "retry" is probably not great here, I meant more that it would re-establish a connection if the connection was closed, not necessarily poll for some period attempting multiple times).

If you make sure that all initialization happens during the "build" phase, and that connections are re-established on the next request you won't need to retry (or wait on other containers to start). If connections are opened lazily (when the first request is made), instead of eagerly (during startup), I suspect you won't need to retry at all.

the problem lies in the order [fig] does things

I don't see any mention of that in this discussion so far. Fig orders startup based on the links specified in the config, so it should always start containers in the right order. Can you provide a test case where the order is incorrect?

I have to agree with @dnephin here. Sure, it would be convenient if compose/fig was able to do some magic and check availability of services, however, what would the expected behavior be if a service doesn't respond? That really depends on the requirements of your application/stack. In some cases, the entire stack should be destroyed and replaced with a new one, in other cases a failover stack should be used. Many other scenarios can be thought of.

Compose/Fig cannot make these decisions, and monitoring services should be the responsibility of the applications running inside the container.

I would like to suggest that @dnephin has merely been lucky. If you fork two processes in parallel, one of which will connect to a port that the other will listen to, you are essentially introducing a race condition; a lottery to see which process happens to initialize faster.

I would also like to repeat the WordPress initialization example: It runs a startup shell script that creates a new database if the MySQL container doesn't yet have it (this can't be done when building the Docker image, since it's dependent on the externally mounted data volume). Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors and implement some sane retry logic within the shell script. I consider it highly likely that the author of the image will never actually test the startup script against the said race condition.

Still, Docker's built-in restart policy provides a workaround for this, if you're ready to accept that containers sporadically fail to start and regularly print errors in logs. (And if you remember to turn it on.)

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

this can't be done when building the Docker image, since it's dependent on the externally mounted data volume

True. An approach here is to start just the database container once (if needed, with a different entrypoint/command), to initialise the database, or use a data-only container for the database, created from the same image as the database container itself.

Such a script becomes significantly more complex if it has to distinguish generic database errors from "database is not yet ready" errors

Compose/Fig will run into the same issue there; How to check if MySQL is up, and accepting connections? (and PostgreSQL, and (insert your service here)). Also, where should the "ping" be executed from? Inside the container you're starting, from the host?

As far as I can tell, the official WordPress image includes a check to see if MySQL is accepting connections in the docker-entrypoint.sh

@thaJeztah "Add some simple retry logic in PHP for MySQL connection errors" authored by tianon 2 days ago - Nice. :-) Who knows, maybe this will become a standard approach after all, but I still have my doubts, especially about this kind of retry implementations actually having being tested by all image authors.

About the port pinging - I can't say offhand what the optimal implementation would be. I guess maybe simple connection checking from a temporary linked container and retrying while getting ECONNREFUSED. Whatever solves 80% (or possibly 99%) of the problems, so users don't have to solve them by themselves again and again every time.

@kennu Ah! Thanks, wasn't aware it was just added recently, just checked the script now because of this discussion.

To be clear, I understand the problems you're having, but I'm not sure Compose/Fig would be able to solve them in a clean way that works for everyone (and reliably). I understand many images on the registry don't have "safeguards" in place to handle these issues, but I doubt it's Compose/Fig's responsibility to fix that.

Having said the above; I do think it would be a good thing to document this in the Dockerfile best practices section.

People should be made aware of this and some examples should be added to illustrate how to handle service "outage". Including a link to the WikiPedia article that @dnephin mentioned (and possibly other sources) for reference.

commented

I ran into the same problem and like this idea from @kennu

Personally, I would make Things Just Work, by making Fig autodetect which container ports are exposed to a linked container, ping them before starting the linked container (with a sane timeout), and ultimately provide a configuration setting to override/disable this functionality.

I think this would solve a lot typical use cases, like for me when depending on the official mongodb container.

I agree with @soupdiver. I am also having trouble in conjunction with a mongo container, and although I have it working with a start.sh script, the script is not very dynamic and adds another file I need to keep in my repo (I would like to just have a Dockerfile and docker-compose.yml in my node repo). It would be nice if there were some way to just Make It Work, but I think something simple like a wait timer won't cut it in most cases.

IMO pinging is not enough, because the basic network connection may be available, but the service itself is still not ready.
This is the case with the MySQL image for example, using curl or telnet for the connection check on the exposed ports would be safer, although I don't know if it would be enough. But most containers don't have these tools installed by default.

Could docker or fig handle these checks?

Could docker or fig handle these checks?

In short: no. For various reasons;

  • Performing a "ping" from within a container would mean running a second process. Fig/Compose cannot automatically start such process, and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.
  • (As I mentioned in a previous comment), each service requires different way to check if it is accepting connections / ready for use. Some services may need credentials or certificates to establish a connection. Fig/Compose cannot automatically invent how to do that.

and I don't think you'd want Fig/Compose to modify your container by installing software (such as curl or telnet) in it.

No, for sure not.

Fig/Compose cannot automatically invent how to do that.

Not invent. I was thinking more about an instruction for fig or docker, how to check it, eg.

web:
    image: nginx
    link: db
db:
   is_available: "curl DB_TCP_ADDR:DB_TCP_PORT"

The telnet command would be executed on the docker-host, not in the container.
But I am just thinking loud, I know that this is not the perfect solution. But the current way of using custom check-scripts for the containers could be improved.

The telnet command would be executed on the docker-host, not in the container.

Then curl or <name a tool that's needed> would have to be installed on the host. This could even have huge security issues (e.g. someone wants to be funny and uses is_available: "rm -rf /"). Apart from that, being able to access the database from the host is no guarantee that it's also accessible from inside the container.

But I am just thinking loud, ...

I know, and I appreciate it. Just think there's no reliable way to automate this, or would serve most use-cases. In many cases you'd end up with something complex (take, for example, the curl example; how long should it try to connect? Retry?). Such complexity is better to move inside the container, which would also be useful if the container was started with Docker, not Fig/Compose.

@thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.

I’m going to repeat a suggestion I made earlier: It would be sufficient for me if I could state in the fig.yml “wait for this container to exit before running this other container”.

This would allow me to craft a container that knows how to wait for all it’s dependencies - check ports, initialise databases, whatever - and would require fig know as little as possible.

I would see it configured as something like:

“”"
app:
links:
- db:db
prereqs:
- runthisfirst

runthisfirst:
links:
- db:db
“””

runthisfirst has a link that means the database starts up so it can check access. app will only run once runthisfirst has exited (bonus points if runthisfirst has to exit successfully).

Is this feasible as an answer?

KJL

On 10 Feb 2015, at 05:28, Tobias Munk notifications@github.com wrote:

@thaJeztah https://github.com/thaJeztah I totally agree with you. And it's very likely that there will be no 100% solution.


Reply to this email directly or view it on GitHub #374 (comment).

I've just tried migrating my shell script launchers and ran into this issue. It would be nice even just to add a simple sleep/wait key that just sleeps for that number of seconds before launching the next container.

db:
  image: tutum/mysql:5.6
  sleep: 10
app:
  link:
    - db:db

I really dno't like this for a number of reasons.

a) I think it's the wrong place for this
b) How long do you sleep for?
c) What if the timeout is not long enougH?

Aside from the obvious issues I really don't think
infrastructure should care about what the application
is and vice versa. IHMO the app should be written to be
more tolerant and/or smarter about it's own requirements.

That being said existing applications and legacy applications
will need something -- But it should probably be more along
the lines of:

a docker-compose.yml:

db:
  image: tutum/mysql:5.6
app:
  wait: db
  link:
    - db:db

Where wait waits for "exposed" services on db to become available.

The problem is how do you determine that?

In the simplest cases you wait until you can successfully open
a tcp or udp connection to the exposed services.

This might be overkill for this problem but what would be a nice solution is if docker provided an event triggering system where you could initiate a trigger from one container that resulted in some sort of callback in another container. In the case of waiting on importing data into a MySQL database before starting another service, just monitoring whether the port was available isn't enough.

Having an entrypoint script set an alert to Docker from inside the container (set a pre-definied environment variable for example) that triggered an event in another container (perhaps setting the same synchronized environment variable) would enable scripts on both sides to know when certain tasks are complete.

Of course we could set up our own socket server or other means but that's tedious to solve a container orchestration issue.

@aanand I almost have something working using your wait approach as the starting point. However, there is something else happening between docker-compose run and docker run where the former appears to hang whilst the later works a charm.

example docker-compose.yml:

db:
  image: postgres
  ports:
    - "5432"
es:
  image: dockerfile/elasticsearch
  ports:
    - "9200"
wait:
  image: n3llyb0y/wait
  environment:
    PORTS: "5432 9200"
  links:
    - es
    - db

then using...

docker-compose run wait

however this is not to be. The linked services start and it looks like we are about to wait only for it to choke (at least within my virtualbox env. I get to the nc loop and we get a single dot then...nothing).

However, with the linked services running I can use this method (which is essentially what I have been doing for our CI builds)

docker run -e PORTS="5432 9200" --links service_db_1:wait1 --links service_es_1:wait2 n3llyb0y/wait

It feels like docker-compose run should work in the same way. The difference is that when using docker-compose run with the detach flag -d you get no wait benefit as the wait container backgrounds and I think (at this moment in time) that not using the flag causes the wait to choke on the other non-backgrounded services. I am going to take a closer look

After a bit of trial and error it seems the above approach does work! It's just the busybox base doesn't have a netcat util that works very well. My modified version of @aanand wait utility does work against docker-compose 1.1.0 when using docker-compose run <util label> instead of docker-compose up. Example of usage in the link.

Not sure if it can handle chaining situations as per the original question though. Probably not.

Let me know what you think.

This is a very interesting issue. I think it would be really interesting to have a way that one container waits until another one it's ready. But as everybody says, what does ready mean? In my case I have a container for MySQL, another one that manage its backups and is also in charge of import an initial database, and then the containers for each app that need the database. It's obvious that to wait the ports to be exposed is not enough. First the mysql container must be started and then the rest should wait until the mysql service is ready to use, not before. To get that, I have needed to implement a simple script to be executed on reboot that uses the docker exec functionality. Basically, the pseudo-code would be like:

run mysql
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"show tables\""
run mysql-backup
waitUntil "docker exec -t mysql mysql -u root -prootpass database -e \"describe my_table\""
run web1
waitUntil "dexec web1 curl localhost:9000 | grep '<h1>Home</h1>'"
run web2
waitUntil "dexec web2 curl localhost:9000 | grep '<h1>Home</h1>'"
run nginx

Where waitUntil function has a loop with a timeout that evals the docker exec … command and check if the exit code is 0.

With that I assure that every container waits until its dependencies are ready to use.

So I think it could be an option to integrate within compose utility. Maybe something like that, where wait_until declares a list of other dependencies (containers) and waits for each one until they respond ok to the corresponding command (or maybe with an optional pattern or regex to check if the result matches to something you expect, even though using grep command could be enough).

mysql:
  image: mysql
  ...
mysql-backup:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "show tables"
  ...
web1:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
web2:
  links:
   - mysql
  wait_until:
   - mysql: mysql -u root -prootpass database -e "describe my_table"
  ...
nginx:
  links:
   - web1
   - web2
  wait_until:
   - web1: curl localhost:9000 | grep '<h1>Home</h1>'
   - web2: curl localhost:9000 | grep '<h1>Home</h1>'
  ...

Wha not a simple eait for the port like it?
http://docs.azk.io/en/azkfilejs/wait.html#

@robsonpeixoto: Waiting for the port isn't sufficient for a lot of use cases. For example, let's say you are seeding a database with data on creation and don't want the web server to start and connect to it until the data operation has completed. The port will be open the whole time so that wouldn't block the web server from starting.

+1 I'm having the same issue when using Docker for testing my Rails apps which depend on MySQL

+1 I have this issue too. I like @adrianhurt idea, where you actually supply the condition to be evaluated to determine if the wait is complete. That way you still have a nice declarative yml, and you don't have to have an arbitrary definition of "ready".

I've had this tab open for a while: http://crosbymichael.com/docker-events.html ...seems relevant

+1 for simple timeout

+1 for a ready condition

I am solving this very reliably on the application level since a while, like it was recommended in this thread.

Just to give you an idea how this can be implemented for MySQL + PHP here's my code.

From igorw/retry :)

Since the network is reliable, things should always work. Am I right? For those cases when they don't, there is retry.

commented

+1

@schmunk42 Nice stuff - I like that it's a good example of both establishing the connection and performing an idempotent database setup operation.

Might be good to create a (some) basic example(s) for inclusion in the docs, for different cases, e.g. NodeJS, Ruby, PHP.

+1, at least should provide some options to add some delay before the container starts successfully.

How to solve problems when you try to connect services that aren't your code.
For example, if a have the service Service and the database InfluxDB. Services requires InfluxDB and the InfluxDB has a slow startup.

How can docker-compose wait for InfluxDB be ready?

I the code is my, I can solve it putting a retry. But for third app I can't changes the code.

@robsonpeixoto there are some examples in this ticket with netcat or similar ways. You can take a look to my MySQL example in another ticket: moby/moby#7445 (comment)

That's the reason I think each container should have the optional ability to indicate its own readiness. For a DB, for example, I want to wait until the service is completely ready, not when the process is created. I solve this with customized checks with docker exec and checking if it can solve a simple query, for example.

Some optional flag for docker run to indicate an internal check command would be great to later link it from another container using a special flag for the link.

Something like:

$ sudo docker run -d --name db training/postgres --readiness-check /bin/sh -c "is_ready.sh"
$ sudo docker run -d -P --name web --link db:db --wait-for-readiness db training/webapp python app.py

Where is_ready.sh is a simple boolean test which is in charge of the decision of when the container is considered as ready.

@schmunk42 nice quote!

+1

commented

+1

+1

Actually I changed my mind about this, so -1

It makes more sense for your container to check wether the 3rd party service is available, and this is easily done with a little bash wrapper script that uses nc for example.

Relying on a delay is tempting, but it's a poor solution because:

  • Your container will always wait X seconds before being ready.
  • X seconds might still not be enough in some cases (e.g heavy I/O or CPU on the host), so your container is still not failsafe.
  • No fail strategy.

Relying on writing a wrapper bash script is better because:

  • Your container is ready as soon as it possibly can.
  • You can implement any fail strategy, e.g try 10 times then fail, try forever, etc. You can even implement the delay yourself by sleeping before trying!

Reading through this threadnought I see that no one mentions secrets. I'm trying to use a data-only container which requests secrets once it is run. The problem I have: If my secrets take too long to transmit/decrypt then my dependant container fails because the data it's expecting isn't there. I can't really use the method of "well put everything in the container before you run it" because they are secrets.

I know there is some ambiguity around data-only containers in compose due to the context of the return code, but is there a better way to do this?

Similarly changing my mind on this, -1. @dnephin's approach is entirely correct. If your application depends on a service, the application itself should be able to handle unavailability of that service gracefully (e.g. re-establishing a connection). It shouldn't be some bash wrapper script or some logic in Compose or Docker, it's the responsibility of the application itself. Anything not at the application level will also only work on initialization; if that service goes down, a wrapper script or something will not be executed.

Now if we could get application/library/framework developers to realize and support this responsibility, that would be fantastic.

Tough to do considering the approaches you would take involve sideloading other daemons which isn't recommended. For my example where I have a rails app attempting to connect to a MySQL database while another rails app is currently migrating and seeding the DB on initial startup, for me to make the rails app know to not attempt to use the DB, I would either have to mod the ActiveRecord library (not going to happen) or run a script that keeps checking to see if the DB has been migrated and seeded. But how do I know for sure without knowing what data is supposed to be in there and/or having some script that runs on the system that's seeding the DB to inform the rest to connect to it.

Maybe i'm missing the obvious solution but your answer of "developers should be able to deal with this in their own code" breaks down when you're using off the shelf libraries and when you're not "supposed" to sideload daemons into a container.