rosskukulinski / kubernetes-rethinkdb-cluster

RethinkDB cluster on top of Kubernetes made easy.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

localhost port 8080 connection confused

chrisabrams opened this issue · comments

Having trouble getting the container to start. This error continues to happen:

  1m    1m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Started     Started container with docker id 2a158eecfcfd
  16m   42s 11  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Warning Unhealthy   Liveness probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 8080: Connection refused

  16m   23s 31  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Warning Unhealthy   Readiness probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 8080: Connection refused

Any idea why this is not working? I have 3 replicas running no problem, and 3 that cannot seem to connect. Seems weird to not be able to ping itself?

I tried increasing the initialDelaySeconds and that worked for one pod, but some pods I just had to completely remove the livenessProbe and readinessProbe to get them started. I understand why those were there, but for some reason, they were actually preventing my setup from working.

It would seem removing the livenessProbe and readinessProbe doesn't actually fix things, it just stops Kubernetes from killing the containers.

Would this error mean anything?

WARNING: ignoring --server-name because this server already has a name.

@chrisabrams can you past the output from kubectl logs for one of the pods that's having this issue?

also kubectl describe <pod> - removing any sensitive info

Logs:

+ exec rethinkdb --server-name rethinkdb_replica_6_4146152185_e0grs --canonical-address 10.24.6.12 --bind all --join 10.24.5.16:29015 --cache-size 1024
WARNING: ignoring --server-name because this server already has a name.
Running rethinkdb 2.3.5~0jessie (GCC 4.9.2)...
Running on Linux 4.4.14+ x86_64
Loading data from directory /data/rethinkdb_data
Listening for intracluster connections on port 29015
Connected to server "rethinkdb_replica_3_2856842387_cs9e8" 2f8af5ac-b9ed-421c-9678-b1afd24a588d
Connected to server "rethinkdb_replica_2_476181460_3q1wi" 89328198-9a3a-48f8-a7a5-930067be8bf2
Listening for client driver connections on port 28015
Listening for administrative HTTP connections on port 8080
Listening on cluster addresses: 127.0.0.1, 10.24.6.12, ::1, fe80::b8f0:3cff:fe04:80b%3
Listening on driver addresses: 127.0.0.1, 10.24.6.12, ::1, fe80::b8f0:3cff:fe04:80b%3
Listening on http addresses: 127.0.0.1, 10.24.6.12, ::1, fe80::b8f0:3cff:fe04:80b%3
Server ready, "rethinkdb_replica_6_3969184722_r5xid" d1345de4-6b83-4117-a2fa-5001a496bf01
Connected to server "rethinkdb_replica_1_464188347_39a23" 62c3e9d0-69bc-40b3-a4b6-59bc1168eefb
Connected to server "rethinkdb_replica_5_1572598823_5sk2t" 6f6082ef-fb9c-48f1-abd5-194717516508
Connected to server "rethinkdb_replica_4_3172594744_1wte5" 95ee171b-1705-4843-8439-a78feca12a51
Disconnected from server "rethinkdb_replica_1_464188347_39a23" 62c3e9d0-69bc-40b3-a4b6-59bc1168eefb
Disconnected from server "rethinkdb_replica_2_476181460_3q1wi" 89328198-9a3a-48f8-a7a5-930067be8bf2
Connected to server "rethinkdb_replica_1_464188347_39a23" 62c3e9d0-69bc-40b3-a4b6-59bc1168eefb
Connected to server "rethinkdb_replica_2_476181460_3q1wi" 89328198-9a3a-48f8-a7a5-930067be8bf2

Describe:

Name:       rethinkdb-replica-3-3819502188-9qoxr
Namespace:  
Node:       default-pool-22328c35-ijcs/10.240.0.12
Start Time: Thu, 20 Oct 2016 20:17:10 -0400
Labels:     db=rethinkdb
        instance=three
        namespace=
        pod-template-hash=3819502188
        role=replica
Status:     Running
IP:     10.24.5.11
Controllers:    ReplicaSet/rethinkdb-replica-3-3819502188
Containers:
  rethinkdb:
    Container ID:   docker://acc1483ff7d9296976da3a6a187d38717f0f4204e45fcb20c1edf7598b7c9871
    Image:      rosskukulinski/rethinkdb-kubernetes:2.3.5
    Image ID:       docker://sha256:46f7371483c3ee7df84b043b73982c94e32503bac35df0220a5891f7181db94a
    Ports:      8080/TCP, 28015/TCP, 29015/TCP
    Args:
      --cache-size
      1024
    Limits:
      cpu:  250m
      memory:   4Gi
    Requests:
      cpu:      250m
      memory:       4Gi
    State:      Running
      Started:      Thu, 20 Oct 2016 20:33:57 -0400
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 20 Oct 2016 20:33:08 -0400
      Finished:     Thu, 20 Oct 2016 20:33:57 -0400
    Ready:      False
    Restart Count:  9
    Liveness:       exec [/ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Readiness:      exec [/ready-probe.sh] delay=15s timeout=5s period=10s #success=1 #failure=3
    Volume Mounts:
      /data from storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token (ro)
    Environment Variables:
      POD_NAMESPACE:     (v1:metadata.namespace)
      POD_NAME:     rethinkdb-replica-3-3819502188-9qoxr (v1:metadata.name)
      POD_IP:        (v1:status.podIP)
      RETHINK_CLUSTER:  rethinkdb
Conditions:
  Type      Status
  Initialized   True 
  Ready     False 
  PodScheduled  True 
Volumes:
  storage:
    Type:   GCEPersistentDisk (a Persistent Disk resource in Google Compute Engine)
    PDName: rethinkdb-storage-3
    FSType: ext4
    Partition:  0
    ReadOnly:   false
  default-token-w7o3i:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-w7o3i
QoS Class:  Guaranteed
Tolerations:    <none>
Events:
  FirstSeen LastSeen    Count   From                            SubobjectPath           Type        Reason      Message
  --------- --------    -----   ----                            -------------           --------    ------      -------
  17m       17m     1   {default-scheduler }                                    Normal      Scheduled   Successfully assigned rethinkdb-replica-3-3819502188-9qoxr to default-pool-22328c35-ijcs
  16m       16m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Created     Created container with docker id a8ea8743476b; Security:[seccomp=unconfined]
  16m       16m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Started     Started container with docker id a8ea8743476b
  16m       16m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Killing     Killing container with docker id a8ea8743476b: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  15m       15m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Created     Created container with docker id bf98a14a5de1; Security:[seccomp=unconfined]
  15m       15m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Started     Started container with docker id bf98a14a5de1
  15m       15m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Started     Started container with docker id 4735d12480f7
  15m       15m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Killing     Killing container with docker id bf98a14a5de1: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  15m       15m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Created     Created container with docker id 4735d12480f7; Security:[seccomp=unconfined]
  14m       14m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Killing     Killing container with docker id 4735d12480f7: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  14m       14m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Started     Started container with docker id 5bf31ed13a37
  14m       14m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Created     Created container with docker id 5bf31ed13a37; Security:[seccomp=unconfined]
  13m       13m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Killing     Killing container with docker id 5bf31ed13a37: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  13m       13m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Started     Started container with docker id 8b06a8f7667d
  13m       13m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Created     Created container with docker id 8b06a8f7667d; Security:[seccomp=unconfined]
  12m       12m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Created     Created container with docker id af0739c22d93; Security:[seccomp=unconfined]
  12m       12m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Started     Started container with docker id af0739c22d93
  12m       12m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Killing     Killing container with docker id 8b06a8f7667d: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  11m       11m     1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal      Killing     Killing container with docker id af0739c22d93: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  11m       10m     7   {kubelet default-pool-22328c35-ijcs}                    Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "rethinkdb" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=rethinkdb pod=rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)"

  10m   10m 1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Started     Started container with docker id 878d957cf94d
  10m   10m 1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Created     Created container with docker id 878d957cf94d; Security:[seccomp=unconfined]
  9m    9m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Killing     Killing container with docker id 878d957cf94d: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  9m    7m  13  {kubelet default-pool-22328c35-ijcs}                    Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "rethinkdb" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=rethinkdb pod=rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)"

  7m    7m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Created     Created container with docker id 96a8ceafcbee; Security:[seccomp=unconfined]
  7m    7m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Started     Started container with docker id 96a8ceafcbee
  6m    6m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Killing     Killing container with docker id 96a8ceafcbee: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  11m   1m  44  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Warning BackOff     Back-off restarting failed docker container
  6m    1m  24  {kubelet default-pool-22328c35-ijcs}                    Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "rethinkdb" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=rethinkdb pod=rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)"

  1m    1m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Created     Created container with docker id 2a158eecfcfd; Security:[seccomp=unconfined]
  1m    1m  1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Started     Started container with docker id 2a158eecfcfd
  16m   42s 11  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Warning Unhealthy   Liveness probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 8080: Connection refused

  16m   23s 31  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Warning Unhealthy   Readiness probe failed:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 8080: Connection refused

  16m   16s 10  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Pulling pulling image "rosskukulinski/rethinkdb-kubernetes:2.3.5"
  16m   16s 10  {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Pulled  Successfully pulled image "rosskukulinski/rethinkdb-kubernetes:2.3.5"
  16s   16s 1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Killing Killing container with docker id 2a158eecfcfd: pod "rethinkdb-replica-3-3819502188-9qoxr_(b5888bf9-9723-11e6-ad38-42010af00161)" container "rethinkdb" is unhealthy, it will be killed and re-created.
  16s   16s 1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Created (events with common reason combined)
  16s   16s 1   {kubelet default-pool-22328c35-ijcs}    spec.containers{rethinkdb}  Normal  Started (events with common reason combined)

@chrisabrams are these nodes under heavy load? I'm wondering if you're actively trying to backfill, perhaps the liveness/readiness checks have too aggressive of a timeout?

If you disable the health check / ready check, does this db instance show up in the rethink cluster?

Also you provided logs / describe for two difference instances -- can you provide matching ones for the same replica?

Yes they are under heavy load as they are backfilling :O

The logs are the exact same minus the replica number. All of them had the exact same issue.

It would seem pinging the localhost:8080 is not a great way to check for health when under heavy load like backfill. I removed the health checks ~24 hours ago and I've had 2 of 6 replicas crash once in the past 24 hours (probably from backfill load). Otherwise things seem to be smooth, and those two replicas reconnected no problem.

Ok - duly noted. It wouldn't surprise me if the webui is less prioritized compared to the rest of the rethink system. Might be worth opening an issue against rethinkdb/rethinkdb re: UI not working under load. I'm going to close this since we should resolve this with #11

I think the idea raised in #11 is a good path for this project.