Errors occurred after Are your cluster is in Azure cloud or not？

Question

Errors occurred after Are your cluster is in Azure cloud or not？

chjm opened this issue 2 years ago · comments

Hello, I am a developer from China, tying to deploy pai v1.8.0 in a private cloud,

enviroment: Ubuntu16.04 + OpenPAI v1.8.0

As you can see, i tried to use https://openpai.readthedocs.io/zh_CN/zh_cn_pai-1.5.y/manual/cluster-admin/configuration-for-china.html , but it not works, so I replaced some links to make sure I could access the mirrors I needed.

like this:
`
user: ***
password: ***
docker_image_tag: v1.8.0

#openpai_kubespray_extra_var:
#download_container: false
#skip_downloads: true

#gcr_image_repo: " registry.cn-hangzhou.aliyuncs.com"
#kube_image_repo: " registry.cn-hangzhou.aliyuncs.com/google-containers"
#kubeadm_download_url: "https://shaiictestblob01.blob.core.chinacloudapi.cn/share-all/kubeadm"
#hyperkube_download_url: "https://shaiictestblob01.blob.core.chinacloudapi.cn/share-all/hyperkube"
gcr_image_repo: "registry.cn-hangzhou.aliyuncs.com"
kube_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers"

openpai_kubespray_extra_var:
pod_infra_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/pause-{{ image_arch }}"
dnsautoscaler_image_repo: "docker.io/mirrorgooglecontainers/cluster-proportional-autoscaler-{{ image_arch }}"
tiller_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-helm/tiller"
registry_proxy_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/kube-registry-proxy"
metrics_server_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server-amd64"
addon_resizer_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/addon-resizer"
dashboard_image_repo: "registry.cn-hangzhou.aliyuncs.com/google_containers/kubernetes-dashboard-{{ image_arch }}"
`

Then, I used the quick-start-kubespry.sh to deploy pai, the error occured:
... Generating kubespray configuration Are your cluster is in Azure cloud or not? (Y/N) (case sensitive)N Traceback (most recent call last): File "/root/pai/contrib/kubespray/script/k8s_generator.py", line 72, in <module> main() File "/root/pai/contrib/kubespray/script/k8s_generator.py", line 67, in main map_table File "/root/pai/contrib/kubespray/script/utils.py", line 53, in generate_template_file generated_template = generate_from_template_dict(template, map_table) File "/root/pai/contrib/kubespray/script/utils.py", line 41, in generate_from_template_dict map_table File "/usr/local/lib/python3.5/dist-packages/jinja2/environment.py", line 1008, in render return self.environment.handle_exception(exc_info, True) File "/usr/local/lib/python3.5/dist-packages/jinja2/environment.py", line 780, in handle_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.5/dist-packages/jinja2/_compat.py", line 37, in reraise raise value.with_traceback(tb) File "<template>", line 410, in top-level template code TypeError: 'NoneType' object is not iterable

Is there a way to fix this or make sure that I can deploy it correctly?

By the way, i also tried the method mentioned in the #5592 , it performs more processes but ultimately still faild.

Thank you.

Binyang Li · Answer 1 · Fri May 13 2022 09:58:36 GMT+0800 (China Standard Time)

@suiguoxin @siaimes Any comments?

Guoxin · Answer 2 · Fri May 13 2022 10:38:05 GMT+0800 (China Standard Time)

@hzy46 May know this part better

siaimes · Answer 3 · Fri May 13 2022 12:00:22 GMT+0800 (China Standard Time)

You can try this:

https://github.com/siaimes/k8s-share

This is the solution I used now, which is simple and stable.

chjm · Answer 4 · Mon May 16 2022 11:54:04 GMT+0800 (China Standard Time)

You can try this:

https://github.com/siaimes/k8s-share

This is the solution I used now, which is simple and stable.

yeah, I tried this and here is what I'm experiencing now:
`TASK [kubernetes/master : Create hardcoded kubeadm token for joining nodes with 24h expiration (if defined)] ***************************************************************************************************************************
Monday 16 May 2022 11:17:44 +0800 (0:00:00.040) 0:02:57.894 ************

TASK [kubernetes/master : Create kubeadm token for joining nodes with 24h expiration (default)] ****************************************************************************************************************************************
Monday 16 May 2022 11:17:44 +0800 (0:00:00.047) 0:02:57.941 ************
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (5 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (4 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (3 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (2 retries left).
FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (1 retries left).
fatal: [pai-master -> 192.168.0.20]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["/usr/local/bin/kubeadm", "--kubeconfig", "/etc/kubernetes/admin.conf", "token", "create"], "delta": "0:01:15.022307", "end": "2022-05-16 11:25:40.956708", "msg": "non-zero return code", "rc": 1, "start": "2022-05-16 11:24:25.934401", "stderr": "timed out waiting for the condition", "stderr_lines": ["timed out waiting for the condition"], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT *********************************************************************************************************************************************************************************************************************

PLAY RECAP *****************************************************************************************************************************************************************************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
pai-master : ok=509 changed=17 unreachable=0 failed=1 skipped=509 rescued=0 ignored=0
pai-worker : ok=355 changed=12 unreachable=0 failed=0 skipped=293 rescued=0 ignored=0 `

siaimes · Answer 5 · Mon May 16 2022 11:58:32 GMT+0800 (China Standard Time)

siaimes commented 2 years ago

chjm · Answer 6 · Mon May 16 2022 12:09:52 GMT+0800 (China Standard Time)

I have not deleted the node and I have tried the command in the worker node, but the token creation failure error is still there.

siaimes · Answer 7 · Mon May 16 2022 12:29:34 GMT+0800 (China Standard Time)

Your log is reported by the master node, so run this command on the master node.

chjm · Answer 8 · Mon May 16 2022 13:11:04 GMT+0800 (China Standard Time)

Your log is reported by the master node, so run this command on the master node.

Actually，I have run this command on all nodes(dev, master and worker).

siaimes · Answer 9 · Mon May 16 2022 20:04:34 GMT+0800 (China Standard Time)

kubernetes/kubeadm#1447 (comment)

kubernetes-sigs/kubespray#5227

This thread may be useful for you.

siaimes · Answer 10 · Thu Jun 09 2022 11:14:13 GMT+0800 (China Standard Time)

#5786 (comment)

@chjm This may be useful for you.