TrafficController fails resolving CC via consulwhen the instance is being stop
keymon opened this issue · comments
Tell us why you are submitting?
- I found a bug - here are some steps to recreate it.
What?
During stop, bosh does monit stop all
. The services would stop in any order, so consul-agent
might stop before trafficcontroller
. If consul-agent
is installed locally for service discovery via DNS of the CC TLS endpoint, trafficcontroller
might fail before stopping.
Detailed context
We are experiencing some errors in our platform during deployments:
Fetching detailed app information:
Failed to fetch app stats: Error requesting app stats: cfclient: error (200002): CF-StatsUnavailable (1 failures)
We correlated these errors to the moment VMs are being stopped during deployments, and we found that traffic controller fails with the message:
2018/01/23 15:13:28 Could not get app information: [Get https://cloud-controller-ng.service.cf.internal:9023/internal/v4/log_access/6ba02750-f073-444f-ba73-ee3cf4a02ec6: dial tcp: lookup cloud-controller-ng.service.cf.internal on 10.0.0.2:53: no such host
That is because the local consul-agent
has been stopped before trafficcontroller
Expected behaviour
The trafficcontroller
shall drain all connections and stop accepting new ones before the consul-agent
is stopped.
Proposed solutions
We would like to discuss two alternative solutions for this problem:
- Create a monit dependency of the
trafficcontroller
monit job, with theconsul-agent
. It might require a new property to the TC job to specific dependencies. - Add a
drain
script to TC: it shall message the TC controller to enter in drain mode (i.e. to update a healthcheck endpoint to get out of any load balancer), and wait for some safe period.
Thoughts?
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/154591053
The labels on this github issue will be updated when the story is started.
/cc @bandesz
@keymon We've merged a fix, can you check to see if this resolves the issue?
Yes, it does, thank you! :)