siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.

Home Page:https://www.talos.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Creating a cluster via the CLI (yc) on Yandex.

remotejob opened this issue · comments

Feature Request

Creating a cluster via the CLI (yc) on Yandex Cloud.

Description

I try to create it on Yandex using https://www.talos.dev/v1.7/talos-guides/install/cloud-platforms/hetzner/ as a base
but unsuccessful using nocloud-amd64.raw and hcloud-amd64.raw.
In all cases I have error
transport: Error while dialing: dial tcp 51.250.67.177:50000: connect: connection refused

We don't know much about Yandex Cloud and what it takes to run Talos there.

Talos metal image should run everywhere, but if YC requires some special setup and handling, it would require some platform support from Talos Linux.

Either way, you should start by looking into the server logs to see why it fails or doesn't fail to boot. Usually these are called "serial console logs".

I try to use https://kevinholditch.co.uk/2023/10/21/creating-a-kubernetes-cluster-using-talos-linux-on-xen-orchestra as base

unset TALOSCONFIG
export CONTROL_PLANE_IP=158.160.117.159
talosctl gen config talos-k8s-yandex https://$CONTROL_PLANE_IP --with-docs=false --with-examples=false --output-dir _out
export TALOSCONFIG="_out/talosconfig"
talosctl config endpoint $CONTROL_PLANE_IP
talosctl config node $CONTROL_PLANE_IP

Till what point all looks OK
but talosctl bootstrap DON'T pass

failed to verify certificate: x509: certificate is valid for 10.128.0.27, 127.0.0.1, ::1, not 158.160.117.159"

talosctl disks --insecure --nodes $CONTROL_PLANE_IP

50000/tcp open ibm-db2

/dev/vda - fhmjk542a09ltf3d228q HDD - - virtio:d00000002v00001AF4 - 11 GB /pci0000:80/0000:80:00.0/0000:81:00.0/virtio2/ /sys/class/block

Please use proper Markdown formatting to make your comments more readable.

In case there's an LB/IP Talos has no idea about, add that public IP to .machine.certSANs in the machine config.

I am stuck on:
58.160.168.148:443: i/o timeout" ?? port 443??
my loadbalances 58.160.168.148:6443

946.684292] [talos] kubernetes endpoint watch error {"component": "controller-runtime", "controller": "k8s.EndpointController", "error": "failed to list *v1.Endpoints: Get \"https://158.160.168.148/api/v1/namespaces/default/endpoints?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0\": dial tcp 158.160.168.148:443: i/o timeout"} [ 950.583089] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"} [ 953.669929] [talos] task startAllServices (1/1): service "etcd" to be "up" [ 956.076398] [talos] etcd is waiting to join the cluster, if this node is the first node in the cluster, please run talosctl bootstrap against one of the following IPs: [ 956.078744] [talos] [10.128.0.14] [ 966.101260] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"} [ 968.668718] [talos] task startAllServices (1/1): service "etcd" to be "up" [ 978.438453] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.NodeApplyController", "error": "1 error(s) occurred:\n\ttimeout"} [ 981.777055] [talos] controller failed {"component": "controller-runtime", "controller": "k8s.KubeletStaticPodController", "error": "error refreshing pod status: error fetching pod status: an error on the server (\"Authorization error (user=apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy)\") has prevented the request from succeeding"} [ 983.669451] [talos] task startAllServices (1/1): service "etcd" to b

OK "add that public IP to .machine.certSANs" resolved the issue!!

Great job!!

58.160.168.148:443: i/o timeout" ?? port 443??

you specify it yourself with talosctl gen config argument, so Talos uses whatever you specify.

Yes. It was my TUPO. Now it looks all working.
Very interesting approach and in general very interesting project.
Thank.
PS. I planing use it in production.