localhost port 8080 connection confused
chrisabrams opened this issue · comments
Having trouble getting the container to start. This error continues to happen:
1m 1m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 2a158eecfcfd
16m 42s 11 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Warning Unhealthy Liveness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8080: Connection refused
16m 23s 31 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Warning Unhealthy Readiness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8080: Connection refused
Any idea why this is not working? I have 3 replicas running no problem, and 3 that cannot seem to connect. Seems weird to not be able to ping itself?
I tried increasing the initialDelaySeconds
and that worked for one pod, but some pods I just had to completely remove the livenessProbe
and readinessProbe
to get them started. I understand why those were there, but for some reason, they were actually preventing my setup from working.
It would seem removing the livenessProbe
and readinessProbe
doesn't actually fix things, it just stops Kubernetes from killing the containers.
Would this error mean anything?
WARNING: ignoring --server-name because this server already has a name.
@chrisabrams can you past the output from kubectl logs
for one of the pods that's having this issue?
also kubectl describe <pod>
- removing any sensitive info
Logs:
+ exec rethinkdb --server-name rethinkdb_replica_6_4146152185_e0grs --canonical-address 10.24.6.12 --bind all --join 10.24.5.16:29015 --cache-size 1024
WARNING: ignoring --server-name because this server already has a name.
Running rethinkdb 2.3.5~0jessie (GCC 4.9.2)...
Running on Linux 4.4.14+ x86_64
Loading data from directory /data/rethinkdb_data
Listening for intracluster connections on port 29015
Connected to server "rethinkdb_replica_3_2856842387_cs9e8" 2f8af5ac-b9ed-421c-9678-b1afd24a588d
Connected to server "rethinkdb_replica_2_476181460_3q1wi" 89328198-9a3a-48f8-a7a5-930067be8bf2
Listening for client driver connections on port 28015
Listening for administrative HTTP connections on port 8080
Listening on cluster addresses: 127.0.0.1, 10.24.6.12, ::1, fe80::b8f0:3cff:fe04:80b%3
Listening on driver addresses: 127.0.0.1, 10.24.6.12, ::1, fe80::b8f0:3cff:fe04:80b%3
Listening on http addresses: 127.0.0.1, 10.24.6.12, ::1, fe80::b8f0:3cff:fe04:80b%3
Server ready, "rethinkdb_replica_6_3969184722_r5xid" d1345de4-6b83-4117-a2fa-5001a496bf01
Connected to server "rethinkdb_replica_1_464188347_39a23" 62c3e9d0-69bc-40b3-a4b6-59bc1168eefb
Connected to server "rethinkdb_replica_5_1572598823_5sk2t" 6f6082ef-fb9c-48f1-abd5-194717516508
Connected to server "rethinkdb_replica_4_3172594744_1wte5" 95ee171b-1705-4843-8439-a78feca12a51
Disconnected from server "rethinkdb_replica_1_464188347_39a23" 62c3e9d0-69bc-40b3-a4b6-59bc1168eefb
Disconnected from server "rethinkdb_replica_2_476181460_3q1wi" 89328198-9a3a-48f8-a7a5-930067be8bf2
Connected to server "rethinkdb_replica_1_464188347_39a23" 62c3e9d0-69bc-40b3-a4b6-59bc1168eefb
Connected to server "rethinkdb_replica_2_476181460_3q1wi" 89328198-9a3a-48f8-a7a5-930067be8bf2
Describe:
Name: rethinkdb-replica-3-3819502188-9qoxr
Namespace:
Node: default-pool-22328c35-ijcs/10.240.0.12
Start Time: Thu, 20 Oct 2016 20:17:10 -0400
Labels: db=rethinkdb
instance=three
namespace=
pod-template-hash=3819502188
role=replica
Status: Running
IP: 10.24.5.11
Controllers: ReplicaSet/rethinkdb-replica-3-3819502188
Containers:
rethinkdb:
Container ID: docker://acc1483ff7d9296976da3a6a187d38717f0f4204e45fcb20c1edf7598b7c9871
Image: rosskukulinski/rethinkdb-kubernetes:2.3.5
Image ID: docker://sha256:46f7371483c3ee7df84b043b73982c94e32503bac35df0220a5891f7181db94a
Ports: 8080/TCP, 28015/TCP, 29015/TCP
Args:
--cache-size
1024
Limits:
cpu: 250m
memory: 4Gi
Requests:
cpu: 250m
memory: 4Gi
State: Running
Started: Thu, 20 Oct 2016 20:33:57 -0400
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 20 Oct 2016 20:33:08 -0400
Finished: Thu, 20 Oct 2016 20:33:57 -0400
Ready: False
Restart Count: 9
Liveness: exec [/ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Readiness: exec [/ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/data from storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token (ro)
Environment Variables:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: rethinkdb-replica-3-3819502188-9qoxr (v1:metadata.name)
POD_IP: (v1:status.podIP)
RETHINK_CLUSTER: rethinkdb
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
storage:
Type: GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
PDName: rethinkdb-storage-3
FSType: ext4
Partition: 0
ReadOnly: false
default-token-w7o3i:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-w7o3i
QoS Class: Guaranteed
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
17m 17m 1 {default-scheduler } Normal Scheduled Successfully assigned rethinkdb-replica-3-3819502188-9qoxr to default-pool-22328c35-ijcs
16m 16m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id a8ea8743476b; Security:[seccomp=unconfined]
16m 16m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id a8ea8743476b
16m 16m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id a8ea8743476b: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
15m 15m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id bf98a14a5de1; Security:[seccomp=unconfined]
15m 15m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id bf98a14a5de1
15m 15m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 4735d12480f7
15m 15m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id bf98a14a5de1: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
15m 15m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id 4735d12480f7; Security:[seccomp=unconfined]
14m 14m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id 4735d12480f7: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
14m 14m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 5bf31ed13a37
14m 14m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id 5bf31ed13a37; Security:[seccomp=unconfined]
13m 13m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id 5bf31ed13a37: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
13m 13m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 8b06a8f7667d
13m 13m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id 8b06a8f7667d; Security:[seccomp=unconfined]
12m 12m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id af0739c22d93; Security:[seccomp=unconfined]
12m 12m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id af0739c22d93
12m 12m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id 8b06a8f7667d: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
11m 11m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id af0739c22d93: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
11m 10m 7 {kubelet default-pool-22328c35-ijcs} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "rethinkdb" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=rethinkdb pod=rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)"
10m 10m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 878d957cf94d
10m 10m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id 878d957cf94d; Security:[seccomp=unconfined]
9m 9m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id 878d957cf94d: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
9m 7m 13 {kubelet default-pool-22328c35-ijcs} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "rethinkdb" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=rethinkdb pod=rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)"
7m 7m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id 96a8ceafcbee; Security:[seccomp=unconfined]
7m 7m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 96a8ceafcbee
6m 6m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id 96a8ceafcbee: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
11m 1m 44 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Warning BackOff Back-off restarting failed docker container
6m 1m 24 {kubelet default-pool-22328c35-ijcs} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "rethinkdb" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=rethinkdb pod=rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)"
1m 1m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created Created container with docker id 2a158eecfcfd; Security:[seccomp=unconfined]
1m 1m 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started Started container with docker id 2a158eecfcfd
16m 42s 11 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Warning Unhealthy Liveness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8080: Connection refused
16m 23s 31 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Warning Unhealthy Readiness probe failed: % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8080: Connection refused
16m 16s 10 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Pulling pulling image "rosskukulinski/rethinkdb-kubernetes:2.3.5"
16m 16s 10 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Pulled Successfully pulled image "rosskukulinski/rethinkdb-kubernetes:2.3.5"
16s 16s 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Killing Killing container with docker id 2a158eecfcfd: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
16s 16s 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Created (events with common reason combined)
16s 16s 1 {kubelet default-pool-22328c35-ijcs} spec.containers{rethinkdb} Normal Started (events with common reason combined)
@chrisabrams are these nodes under heavy load? I'm wondering if you're actively trying to backfill, perhaps the liveness/readiness checks have too aggressive of a timeout?
If you disable the health check / ready check, does this db instance show up in the rethink cluster?
Also you provided logs / describe for two difference instances -- can you provide matching ones for the same replica?
Yes they are under heavy load as they are backfilling :O
The logs are the exact same minus the replica number. All of them had the exact same issue.
It would seem pinging the localhost:8080 is not a great way to check for health when under heavy load like backfill. I removed the health checks ~24 hours ago and I've had 2 of 6 replicas crash once in the past 24 hours (probably from backfill load). Otherwise things seem to be smooth, and those two replicas reconnected no problem.
Ok - duly noted. It wouldn't surprise me if the webui is less prioritized compared to the rest of the rethink system. Might be worth opening an issue against rethinkdb/rethinkdb re: UI not working under load. I'm going to close this since we should resolve this with #11
I think the idea raised in #11 is a good path for this project.