Challenge with Make Addons

Question

Challenge with Make Addons

agentbond007 opened this issue 7 years ago · comments

This section of Makefile for Addons

      @scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
	core@${KUBE_API_DNSNAME}:/var/lib/kubernetes/admin.pem \
	core@${KUBE_API_DNSNAME}:/var/lib/kubernetes/admin-key.pem \
	core@${KUBE_API_DNSNAME}:/var/lib/kubernetes/kube-apiserver-ca.pem \
	 ${KUBERNETES_WORKDIR}/

Produces error:
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
make[2]: *** [do-config] Error 1
make[1]: *** [kube-config] Error 2
make: *** [add-ons] Error 2

The directory does exist on either local machine or kube-cluster-master.
/var/lib/kubernetes/admin.pem \

The identity is added with no problem from vault
Identity added: /Users/admin/.ssh/kube-cluster-master.pem

Any ideas?

Thank you.

Xueshan Feng · Answer 1 · Tue May 30 2017 22:02:52 GMT+0800 (China Standard Time)

What if you wait a bit and retry? It takes a little time for the service fully up. Xueshan

…

Sent from my iPhone

On May 29, 2017, at 10:14 PM, Ben Morris ***@***.***> wrote: This section of Makefile for Addons @scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \ core@${KUBE_API_DNSNAME}:/var/lib/kubernetes/admin.pem \ core@${KUBE_API_DNSNAME}:/var/lib/kubernetes/admin-key.pem \ core@${KUBE_API_DNSNAME}:/var/lib/kubernetes/kube-apiserver-ca.pem \ ${KUBERNETES_WORKDIR}/ Produces error: ssh_exchange_identification: Connection closed by remote host ssh_exchange_identification: Connection closed by remote host ssh_exchange_identification: Connection closed by remote host make[2]: *** [do-config] Error 1 make[1]: *** [kube-config] Error 2 make: *** [add-ons] Error 2 The directory does exist on either local machine or kube-cluster-master. /var/lib/kubernetes/admin.pem \ The identity is added with no problem from vault Identity added: /Users/admin/.ssh/kube-cluster-master.pem Any ideas? Thank you. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Ben Morris · Answer 2 · Wed May 31 2017 00:55:31 GMT+0800 (China Standard Time)

Thank you Xueshan. The issue appears to be with getting "kubectl" configured correctly.
I can download the cluster.pem but the scp command isn't working properly for
admin.pem
admin-key.pem
kube-apiserver-ca.pem

files in their proper place.

I have also added "kube-api.buildmodels.net" to hosts file.

I tried scp -o core@kube-api.buildmodels.net:/var/lib/kubernetes/admin.pem

I don't see where you get the admin.pem , admin-key.pem in your script.

Appreciate any assistance.

Xueshan Feng · Answer 3 · Wed May 31 2017 14:25:21 GMT+0800 (China Standard Time)

@agentbond007 Thank you for trying this repo!

I just built a cluster to make sure recent code changes didn't have regression... and I haven't run into problem. However, in past I did occasionally run into similar ssh problem mostly because of timing, and if I make add-on again, or cd resources/add-on; make kube-config it would copy the certs over and generate ~/.kube/config.

All the pem files are provided by vault PKI, and generated as part of master's bootstrap. The code is under resources->master->artifacts->upload->setup.sh. setup.sh calls get-certs.sh in the same upload directory to generate etcd, kube-apiserver, and admin certs.

Master has dependencies on etcd and vault. If re-try make kube-config doesn't work for you, here are some trouble-shooting tips - you have every access to the cluster you build :)

First login to master:

$ cd resources/master
$ make ssh

Check etcd status - need to be root (sudo su)

# source source /etc/profile.d/etcdctl.sh
etcdctl cluster-health
member 64c6ca31c84f1803 is healthy: got healthy result from https://10.240.2.12:2379
cluster is healthy

If it is not healthy, restart etcd:

# systemctl restart etcd2

Check vault status

core@ip-10-240-3-5-my-kube-master ~ $ vault status
Sealed: false
Key Shares: 5
Key Threshold: 3
Unseal Progress: 0
Unseal Nonce: Version: 0.6.5
Cluster Name: vault-cluster-88c43ad7
Cluster ID: bec05d9b-6cac-46cf-e55c-321b952eadfb

High-Availability Enabled: false

Check certs

core@ip-10-240-3-5-my-kube-master /var/lib/kubernetes $ ls -lrt
total 56
-rw-------. 1 core root 2074 May 31 05:48 admin.pem
-rw-------. 1 core root 1675 May 31 05:48 admin-key.pem
-rw-r--r--. 1 root root 2098 May 31 05:48 kube-apiserver.pem
-rw-------. 1 root root 1679 May 31 05:48 kube-apiserver-key.pem
-rw-r--r--. 1 root root 1826 May 31 05:48 kube-apiserver-ca.pem
-rw-r--r--. 1 root root   83 May 31 05:48 token.csv
-rw-r--r--. 1 root root 3247 May 31 05:48 service-account-key.pem

Check master log:

# journalctl -f -u kube-apiserver.service

Oh, don't forget to update your /etc/hosts file with kube-apiserver 's IP if you did not delegate domain name - ELB IPs can change over time.

Let me know if any of the trouble-shooting shows obvious errors.

Ben Morris · Answer 4 · Tue Jun 06 2017 08:37:50 GMT+0800 (China Standard Time)

Hi Xu. Thank you for your comments. I'm continuing to investigate.
Still not working.

Please see below. My master doesn't seem to have the etcdctl.sh

also, no /var/lib/kubernetes

The part of kube-config isn't working so I followed some of your steps.

I went to: cd resources/master

master git:(master) ✗ make ssh
Permitted 22 from 45.49.236.46/32 to master...
Last login: Tue Jun 6 00:22:01 UTC 2017 from 45.49.236.46 on pts/0
Container Linux by CoreOS stable (1353.8.0)
core@ip-10-240-3-9 ~ $ sudo su
ip-10-240-3-9 core # source source /etc/profile.d/etcdctl.sh
bash: source: No such file or directory
ip-10-240-3-9 core # systemctl restart etcd2
ip-10-240-3-9 core # source source /etc/profile.d/etcdctl.sh
bash: source: No such file or directory
ip-10-240-3-9 core # vault status
bash: vault: command not found
ip-10-240-3-9 core # /var/lib/kubernetes $ ls -lrt
bash: /var/lib/kubernetes: No such file or directory
ip-10-240-3-9 core # /var/lib/kubernetes
bash: /var/lib/kubernetes: No such file or directory
ip-10-240-3-9 core # cd var
bash: cd: var: No such file or directory

Any ideas. Thank you very very much for any clues.

Ben Morris · Answer 5 · Tue Jun 06 2017 10:27:51 GMT+0800 (China Standard Time)

Also vault doesn't seem to be installed on the master.

Xueshan Feng · Answer 6 · Tue Jun 06 2017 11:34:03 GMT+0800 (China Standard Time)

@agentbond007 how about etcd cluster?

ssh-add ~/.ssh/<cluster-name>-etcd.pem
cd resources/etcd
make ssh

Run vault status and etcdctl cluster-health? as core user?

All bootstrap file and logs are located under /root/bootstrap on each machine. If etcd cluster status is okay, I'd just reboot master if you haven't tried it - /root/bootstrap/config/cloud-config.yaml has a size zero? Either reboot or just halt the machine to let ASG replace master...

Some of the vault rouble-shootiing docs is here: https://github.com/xuwang/kube-aws-terraform/blob/master/docs/02-vault-pki.md.

Ben Morris · Answer 7 · Wed Jun 07 2017 02:17:57 GMT+0800 (China Standard Time)

Thank you Xu. Very strange issues. I rebooted both the master and etcd. the etcd cluster has the same issue.
core@ip-10-240-2-7 ~ $ sudo su
ip-10-240-2-7 core # source source /etc/profile.d/etcdctl.sh
bash: source: No such file or directory

Also, Vault not installed on either master or etcd
ip-10-240-2-7 / # vault status
bash: vault: command not found

Where in the original build do all the machines install Vault binary and set PATH?

Also,

cd /root/bootstrap/config/
cloud-config.yaml
is EMPTY.

Halted the MASTER. ASG recreated. Same problem. ):

Could it be export TF_VAR_vault_release=0.7.0 in envs.sh ?

Thank you for any help.

Xueshan Feng · Answer 8 · Wed Jun 07 2017 11:05:36 GMT+0800 (China Standard Time)

@agentbond007 var.vault_release default value is 0.7.0, though it doesn't hurt to explicitly define it with TF_VAR_vault_release. The problem doesn't appear to be vault version related. A few more things to check, whenever you get a chance:

Is vault server status show there are mount points, and running normally?
Take a look at s3 bucket. The are named by cluster name as prefix. In the --cloudinit folder, does each cluster (worker, etcd, etc.) has cloud-config.yaml file and is the content looks like normal yaml file?
If it does, then there some download issues. You can try to curl the user-data and run it on a machine, for example:

curl 169.254.169.254/latest/user-data > /tmp/user-data
sh -x /tmp/user-data

Which region are you in? We use signature v4 to download metadata. Wonder if your region supports it..

BTW, I also has a typo, in one of the trouble shooting steps - source source /etc/profile.d/etcdctl.sh should be just source /etc/profile.d/etcdctl.sh.

Ben Morris · Answer 9 · Wed Jun 07 2017 23:25:05 GMT+0800 (China Standard Time)

Thank you Xu. I re-cloned the project and setup a new envs.sh file. I ran make cluster | tee /tmp/build.log . Same problem. Looked in S3 buckets and cluster-config has folders but Vault-S3-backend bucket doesn't contain any objects.
I'm in region us-east-1 . I have attached the build.log too.

Ben Morris · Answer 10 · Wed Jun 07 2017 23:27:16 GMT+0800 (China Standard Time)

build.txt

Downloaded and masked file name.

Xueshan Feng · Answer 11 · Thu Jun 08 2017 00:19:21 GMT+0800 (China Standard Time)

@agentbond007 can you also send journald log from vault?
Login to vault and run journalctl > /tmp/vault.log.

It would also be helpful to share envs.sh (sanitize specifics) so I can try to reproduce. I did tested us-east-1 before. It supports s3 v4 signature..

Xueshan Feng · Answer 12 · Thu Jun 08 2017 05:10:20 GMT+0800 (China Standard Time)

@agentbond007 I was able to re-produce in us-east-1. your build logs are all looks right, but s3 download during system bootstrap fails. You can see it in /tmp/curlLog.log... I am looking into it. Sorry!

Ben Morris · Answer 13 · Thu Jun 08 2017 05:23:03 GMT+0800 (China Standard Time)

Yes Xu! :) Thank you. I'm excited about KAT project and maybe someday I can help contribute. I'm good at "proof reading" . xièxie

Xueshan Feng · Answer 14 · Thu Jun 08 2017 08:39:30 GMT+0800 (China Standard Time)

@agentbond007 The S3 bucket download problem is fixed in xuwang/bash-s3#1.

If you don't want to rebuild the cluster, you can reboot ec2 one by one in this order(you can do it from console): vault, etcd, master, nodes. Before you reboot other machines, make sure to logon to vault, sudo su -; vault status, which should show un-stealed status, and vault mounts should show pki backend mounts.

Hey you are so very nice to help with debugging and documentations! Thank you for sticking with KAT!

Ben Morris · Answer 15 · Fri Jun 09 2017 12:26:39 GMT+0800 (China Standard Time)

Success on us-east-1 . I had to reboot MASTER once at beginning but cluster was healthy after that. Successfully run "make ui" . Dashboard created.
Thank XU and Team. Great work.

Xu Wang · Answer 16 · Sat Jun 10 2017 07:42:09 GMT+0800 (China Standard Time)

@agentbond007 Glad you have your cluster up! Thanks!