OKD with ansible

This repository holds ansible files for installing OKD using Ansible on hardware.

Installing a user-provisioned cluster on bare metal

It assumes you have the following hardware:

Raspberry PI for infrastructure (nginx load balancing and serving the ignition files)
An intel machine that will first serve as bootstrap and then later worker1.
3 master intel machines.

If you have more hardware, adjust the hosts file accordingly.

NOTE: If you want to run a different version of OKD, extract the installer under ./openshift-install which will stop retrieving the installer. The exact command to extract the installer is available per beta version at OKD Nightly Releases.

There are optional components to install which are controlled by ansible variables. These are defined on the command line:

ansible-playbook -i hosts -v deploy-okd.yml --extra-vars "compliance_operator=true argocd=true"

variable	description
compliance_operator	Whether to install the OKD compliance operator
argocd	whether to install the argocd operator

Preparation

Pull your RedHat pull secret and place in file pull-secret.

If you want github integration, create a file called "github-config.json" with similar content as this:

{
  "clientID": "< github client ID >",
  "clientSecret": "< github client secret >",
  "organizations": [
    "< your github organization >"
  ]
}

Set up DNS with the following entries (of course, adjust addresses to your infrastructure):

hostname	Address
infra1.okd4.example.com	192.168.60.180
api-int.okd4.example.com	CNAME infra1.okd4.example.com
api.okd4.example.com	CNAME infra1.okd4.example.com
apps.okd4.example.com	CNAME infra1.okd4.example.com
*.apps.okd4.example.com	CNAME infra1.okd4.example.com
master1.okd4.example.com	192.168.60.181
master2.okd4.example.com	192.168.60.182
master3.okd4.example.com	192.168.60.183
worker1.okd4.example.com	192.168.60.184

If you're using pihole (as I do), increase rate limiting, create /etc/dnsmasq.d/99-openshift.conf with the following content and restart dns (pihole restartdns)

address=/.apps.okd4.example.com/192.168.60.180

If your DNS cannot handle wildcards, add these entries as CNAME, pointing to app.okd4.example.com:

alertmanager-main-openshift-monitoring.apps.okd4.example.com
canary-openshift-ingress-canary.apps.okd4.example.com
console-openshift-console.apps.okd4.example.com
downloads-openshift-console.apps.okd4.example.com
grafana-openshift-monitoring.apps.okd4.example.com
oauth-openshift.apps.okd4.example.com
prometheus-k8s-openshift-monitoring.apps.okd4.example.com
thanos-querier-openshift-monitoring.apps.okd4.example.com

Create a new image for the rasperry pi with enabled ssh and boot it up.

Create a bootable USB from the correct version of Fedora coreos. (At the time of writing, the current working release is 34.20210626.3.1

./openshift-install/openshift-install coreos print-stream-json | jq -r '.architectures.x86_64.artifacts.metal.formats.iso.disk.location'

Set up static DHCP entries for the machines, matching IP addresses above.

Run playbook

It's time to run the playbook. There are a number of steps that will be completed:

Cluster configuration will be created and added into ignition files.
The infrastructure node (RPI) will be setup.

This is how to run the playbook:

ansible-playbook -i hosts -v deploy-okd.yml --extra-vars "compliance_operator=true argocd=true"

Once the playbook tell you to, boot the masters on Fedora coreos USB and start the installation process. The NUCs are a bit slow on the network side so a number of kernel arguments are needed.

$ curl --output install.sh http://infra1.okd4.example.com:8080/install.sh
$ chmod 755 ./install.sh
$ ./install.sh master[1-3]

Once the installation has finished, remove the USB and reboot.

$ sudo reboot now

Verify that the master is trying to pull the secondary ignition from https://api-int.okd4.example.com:22623.

Once all masters are waiting for the secondary ignition, continue the playbook which tell you to boot the first worker machine on Fedora coreos USB and start the installation for the bootstrap process:

$ curl --output install.sh http://infra1.okd4.example.com:8080/install.sh
$ chmod 755 ./install.sh
$ ./install.sh bootstrap

Once the installation has finished, remove the USB and reboot.

$ sudo reboot now

After some time, you will be able to login via ssh to the bootstrap machine and follow the installation:

$ ssh core@worker1.okd4.example.com
The authenticity of host 'worker1.okd4.example.com (192.168.60.184)' can't be established.
ECDSA key fingerprint is SHA256:Z3edOf5ImnxO/x9tchkto5LoEQIaFm8DT/7zyGj5r6g.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'worker1.okd4.example.com,192.168.60.183' (ECDSA) to the list of known hosts.
This is the bootstrap node; it will be destroyed when the master is fully up.

The primary services are release-image.service followed by bootkube.service. To watch their status, run e.g.

  journalctl -b -f -u release-image.service -u bootkube.service
Fedora CoreOS 34
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

Last login: Sat Sep  4 05:55:40 2021 from 192.168.40.50
[core@nucbootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service

When the installation has started, continue the playbook. When the playbook detects that the installation is finished, the playbook will continue with post-installation configuration.

You can now open the cluster console by opening https://console-openshift-console.apps.okd4.example.com.

Have fun with your cluster!

Adding the bootstrap node as a worker

Once the cluster has been correctly installed, shutdown the bootstrap node, remove the partitions and reinstall using this command:

$ curl --output install.sh http://infra1.okd4.example.com:8080/install.sh
$ chmod 755 ./install.sh
$ ./install.sh worker1.okd4.example.com

Since this has not been prepared in the cluster earlier, you need to approve the certificate requests. This is how you list and approve them:

$ KUBECONFIG=./openshift-files/auth/kubeconfig \
    ./openshift-client/oc get csr | grep -i pending
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-4n948   36m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-7n8zl   51m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-c8nhz   20m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-f4vvb   5m41s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-fz8sk   66m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

$ KUBECONFIG=./openshift-files/auth/kubeconfig \
    ./openshift-client/oc adm certificate approve csr-fz8sk
certificatesigningrequest.certificates.k8s.io/csr-fz8sk approved

$ KUBECONFIG=./openshift-files/auth/kubeconfig \
    ./openshift-client/oc get nodes
NAME                       STATUS     ROLES           AGE    VERSION
master0.okd4.example.com   Ready      master,worker   116m   v1.20.0+01994f4-1091
master1.okd4.example.com   Ready      master,worker   116m   v1.20.0+01994f4-1091
master2.okd4.example.com   Ready      master,worker   113m   v1.20.0+01994f4-1091
worker1.okd4.example.com   NotReady   worker          73s    v1.20.0+01994f4-1091

jmw1973 / okd-with-ansible

OKD with ansible

Preparation

Run playbook

Adding the bootstrap node as a worker

About

Languages