shyblower / k-andy

Zero friction Kubernetes stack for startups, prototypes, and playgrounds on Hetzner Cloud.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

k-andy

Zero friction Kubernetes stack on Hetzner Cloud

This terraform module will install a High Availability K3s Cluster with Embedded DB in a private network on Hetzner Cloud. The following resources are provisionised by default (20€/mo):

  • 3x Control-plane: CX11, 2GB RAM, 1VCPU, 20GB NVMe, 20TB Traffic.
  • 2x Worker: CX21, 4GB RAM, 2VCPU, 40GB NVMe, 20TB Traffic.
  • Network: Private network with one subnet.
  • Server and agent nodes are distributed across 3 Datacenters (nbg1, fsn1, hel1) for high availability.


Hetzner Cloud integration:

Auto-K3s-Upgrades

We provide an example how to upgrade your K3s node and agents with the system-upgrade-controller. Check out /upgrade

What is K3s?

K3s is a lightweight certified kubernetes distribution. It's packaged as single binary and comes with solid defaults for storage and networking but we replaced local-path-provisioner with hetzner CSI-driver and klipper load-balancer with hetzner Cloud Controller Manager. The default ingress controller (traefik) has been disabled.

Usage

See a more detailed example with walk-through in the example folder.

Inputs

Name Description Type Default Required
agent_groups Configuration of agent groups
map(object({
type = string
count = number
ip_offset = number
taints = list(string)
}))
{
"default": {
"count": 2,
"ip_offset": 33,
"taints": [],
"type": "cx21"
}
}
no
cluster_cidr Network CIDR to use for pod IPs string "10.42.0.0/16" no
control_plane_server_count Number of control plane nodes number 3 no
control_plane_server_type Server type of control plane servers string "cx11" no
create_kubeconfig Create a local kubeconfig file to connect to the cluster bool true no
hcloud_csi_driver_version n/a string "v1.5.3" no
hcloud_token Token to authenticate against Hetzner Cloud any n/a yes
k3s_version K3s version string "v1.21.3+k3s1" no
kubeconfig_filename Specify the filename of the created kubeconfig file (defaults to kubeconfig-${var.name}.yaml any null no
name Cluster name (used in various places, don't use special chars) any n/a yes
network_cidr Network in which the cluster will be placed. Ignored if network_id is defined string "10.0.0.0/16" no
network_id If specified, no new network will be created. Make sure cluster_cidr and service_cidr don't collide with anything in the existing network. any null no
server_additional_packages Additional packages which will be installed on node creation list(string) [] no
server_locations Server locations in which servers will be distributed list(string)
[
"nbg1",
"fsn1",
"hel1"
]
no
service_cidr Network CIDR to use for services IPs string "10.43.0.0/16" no
ssh_private_key_location Use this private SSH key instead of generating a new one (Attention: Encrypted keys are not supported) string null no
subnet_cidr Subnet in which all nodes are placed string "10.0.1.0/24" no

Outputs

Name Description
agents_public_ips The public IP addresses of the agent servers
cidr_block n/a
control_planes_public_ips The public IP addresses of the control plane servers
k3s_token Secret k3s authentication token
kubeconfig Structured kubeconfig data to supply to other providers
kubeconfig_file Kubeconfig file content with external IP address
network_id n/a
server_locations Array of hetzner server locations we deploy to
ssh_private_key Key to SSH into nodes
subnet_id n/a

Common Operations

Agent server replacement

If you need to cycle an agent, you can do that with a single node following this procedure. Replace the group name and number with the server you want to recreate!

Make sure you drain the nodes first.

terraform taint 'module.my_cluster.module.agent_group["GROUP_NAME"].random_pet.agent_suffix[1]'
terraform apply

This will recreate the agent in that group on next apply.

Control Plane server replacement

Currently you should only replace the servers which didn't initialize the cluster.

terraform taint 'module.my_cluster.hcloud_server.control_plane["#1"]'
terraform apply

Auto-Upgrade

Prerequisite

Install the system-upgrade-controller in your cluster.

KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/controller.yaml

Upgrade procedure

  1. Mark the nodes you want to upgrade (The script will mark all nodes).
KUBECONFIG=kubeconfig.yaml kubectl label --all node k3s-upgrade=true
  1. Run the plan for the servers.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/server-plan.yaml

Warning: Wait for completion before you start upgrading your agents.

  1. Run the plan for the agents.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/agent-plan.yaml

Backups

K3s will automatically backup your embedded etcd datastore every 12 hours to /var/lib/rancher/k3s/server/db/snapshots/. You can reset the cluster by pointing to a specific snapshot.

  1. Stop the master server.
sudo systemctl stop k3s
  1. Restore the master server with a snapshot
./k3s server \
  --cluster-reset \
  --cluster-reset-restore-path=<PATH-TO-SNAPSHOT>

Warning: This forget all peers and the server becomes the sole member of a new cluster. You have to manually rejoin all servers.

  1. Connect you with the different servers. Backup and delete /var/lib/rancher/k3s/server/db on each server.
sudo systemctl stop k3s
rm -rf /var/lib/rancher/k3s/server/db
sudo systemctl start k3s

This will rejoin the server one after another. After some time, all servers should be in sync again. Run kubectl get node to verify it.

Info: It exists no official tool to automate the procedure. In future, rancher might provide an operator to handle this (issue).

Debugging

Cloud init logs can be found on the remote machines in:

  • /var/log/cloud-init-output.log
  • /var/log/cloud-init.log
  • journalctl -u k3s.service -e last logs of the server
  • journalctl -u k3s-agent.service -e last logs of the agent

Credits

About

Zero friction Kubernetes stack for startups, prototypes, and playgrounds on Hetzner Cloud.

License:MIT License


Languages

Language:HCL 100.0%