redhat-nfvpe / kni-ipi-virt

Virtualized OCP IPI install

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KNI IPI Virt

These helper scripts provide a virtualized infrastructure for use with OpenShift baremetal IPI deployment, and then use OpenShift Baremetal Deploy Ansible Installer to deploy a cluster on that virtualized infrastructure. They do the following:

  1. Prepare the provisioning host for OCP deployment (required packages, firewall, etc)
  2. Start DHCP and DNS containers for the OCP baremetal network
  3. Set up NAT forwarding and masquerading to allow the baremetal network to reach an external routable network
  4. Create VMs to serve as the cluster's masters and workers
  5. Create virtual BMC endpoints for the VMs
  6. Clone the OpenShift Baremetal Deploy Ansible Installer and prepare it for use with the virtualized infrastructure
  7. Execute the aforementioned Ansible playbook

Prerequisites

  1. Provisioning host machine must have an externally-facing NIC on a separate VLAN if you wish the cluster to have Internet connectivity
  2. Provisioning host machine must have externally-facing NICs on a separate VLAN for the provisioning and baremetal networks if you wish for the VMs or DHCP/DNS services to be reachable by nodes outside the host
  3. Provisioning host machine must be RHEL 8.1 or CentOS 8.1
  4. If RHEL 8.1, an active subscription is required
  5. A non-root user must be available to execute the scripts and the Ansible playbook. You could add one like so:
    sudo useradd kni
    echo "kni ALL=(root) NOPASSWD:ALL" | sudo tee -a /etc/sudoers.d/kni
    sudo chmod 0440 /etc/sudoers.d/kni
    sudo su - kni -c "ssh-keygen -t rsa -f /home/kni/.ssh/id_rsa -N ''"
    
  6. sudo dnf install -y make git
  7. Copy your OpenShift pull secret to your non-root user's home directory (i.e. /home/kni) and call it pull-secret.txt (this location is ultimately configurable, however -- see below)

Bundled Usage

  1. As your non-root user (such as kni), clone the repo to your provisioning host machine and go to the directory:
    git clone https://github.com/redhat-nfvpe/kni-ipi-virt.git
    cd kni-ipi-virt
    
  2. Set your environment variables in common.sh. These values and their purpose are described in the file.
  3. make all
  4. To remove the VMs, DNS and DHCP containers, use make clean

Isolated Usage

  1. Clone the repo to your provisioning host machine and go to the directory:
    git clone https://github.com/redhat-nfvpe/kni-ipi-virt.git
    cd kni-ipi-virt
    
  2. Set your environment variables in common.sh. These values and their purpose are described in the file.
  3. Execute prep_host.sh, which requires the following variables to be set in common.sh:
  • BM_BRIDGE
  • BM_GW_IP
  • DNS_IP
  • PROV_BRIDGE
  1. If you wish external nodes to be able to reach the services/VMs listed below, you will also need:
  • BM_INTF
  • PROV_INTF
  1. Assuming steps above have been completed, the individual DNS, DCHP and VM bash scripts can be utilized alone to make use of their atomic functionality.

DNS

Create a CoreDNS container to provide DNS on your baremetal network. The following variables are required to be set in common.sh:

  • API_VIP
  • BM_GW_IP
  • BM_INTF (if you want external nodes to be able reach this service)
  • CLUSTER_DOMAIN
  • CLUSTER_NAME
  • DNS_IP
  • DNS_VIP
  • EXT_DNS_IP
  • INGRESS_VIP
  • PROJECT_DIR

Create and start the CoreDNS container:

./dns/start.sh

Stop and remove the CoreDNS container:

./dns/stop.sh

DHCP

Create a Dnsmasq container to provide DHCP on your baremetal network. The following variables are required to be set in common.sh:

  • BM_GW_IP
  • BM_INTF (if you want external nodes to be able reach this service)
  • CLUSTER_DOMAIN
  • CLUSTER_NAME
  • DHCP_BM_MACS
  • DNS_IP
  • PROJECT_DIR

If using the DHCP container with existing machines, you will need to set DHCP_BM_MACS. DHCP_BM_MACS should list your master and worker baremetal network MACs like so: <master0>,..,<masterN>,<worker0>,..,<workerN>. If you do not set this variable, MASTER_BM_MAC_PREFIX and WORKER_BM_MAC_PREFIX will be used (as they would in "Bundled Usage"), which will most likely result in incorrect Dnsmasq configuration (unless you happen to be using the Dnsmasq container with VMs generated by this tool's VM-generation scripts).

Create and start the Dnsmasq container:

./dhcp/start.sh

Stop and remove the Dnsmasq container:

./dhcp/stop.sh

VMs

Create a certain number of VMs for use with an OCP deployment. The following variables are required to be set in common.sh:

  • CLUSTER_NAME
  • LIBVIRT_STORAGE_POOL
  • MASTER_BM_MAC_PREFIX
  • MASTER_CPUS
  • MASTER_MEM
  • MASTER_PROV_MAC_PREFIX
  • MASTER_VBMC_PORT_PREFIX
  • NUM_MASTERS
  • NUM_WORKERS
  • PROJECT_DIR
  • WORKER_BM_MAC_PREFIX
  • WORKER_CPUS
  • WORKER_MEM
  • WORKER_PROV_MAC_PREFIX
  • WORKER_VBMC_PORT_PREFIX

Create the VMs and their vBMCs:

./vms/prov-vms.sh

Destroy the VMs and their vBMCs:

./vms/clean-vms.sh

Troubleshooting

  1. If you are unable to start the DNS container because of an error message like so...

    Error: error from slirp4netns while setting up port redirection: map[desc:bad request: add_hostfwd: slirp_add_hostfwd failed]

    ...try stopping/removing all containers and killing all remaining slirp4nets processes, and then try to start the container again. Sometimes podman fails to clean up the slirp4netns forwarding processes when it stops/removes the DNS container.

  2. Sometimes the Ironic Python Agent used by the underlying Metal3 components (which are themselves part of the IPI installation process) gets stuck while cleaning the VMs' disks. Using a vncviewer such as TigerVNC, you can view the console of the VM and see if the agent's heartbeart is looping continuously (for more than 10 minutes or so). If so, a simple option is to just try the deployment again, but you of course run the chance of hitting a cleaning issue again. A better option is to use the Openstack CLI tool to talk with Ironic and attempt cleaning the problematic nodes manually. The tool can be installed like so:

    sudo pip3 install python-openstackclient
    sudo pip3 install python-ironicclient
    sudo pip3 install python-ironic-inspector-client
    mkdir -p ~/.config/openstack/
    tee "$HOME/.config/openstack/clouds.yaml" > /dev/null << EOF
    clouds:
      metal3-bootstrap:
        auth_type: none
        baremetal_endpoint_override: http://172.22.0.2:6385  
        baremetal_introspection_endpoint_override: http://172.22.0.2:5050
      metal3:                                                            
        auth_type: none                                                  
        baremetal_endpoint_override: http://172.22.0.3:6385              
        baremetal_introspection_endpoint_override: http://172.22.0.3:5050
    EOF
    

If it's a master node that is stuck:

export OS_CLOUD=metal3-bootstrap

Else, if it's a worker node:

export OS_CLOUD=metal3

You can then see the nodes like so:

openstack baremetal node list

Find the node(s) stuck in the clean wait state. Then do the following to abort the current cleaning:

openstack baremetal node abort <node UUID>
openstack baremetal node maintenance set <node UUID>
openstack baremetal node power off <node UUID>
openstack baremetal node manage <node UUID>
openstack baremetal node maintenance unset <node UUID>

Now the node should be in a state where you can execute manual cleaning, as described here.

About

Virtualized OCP IPI install


Languages

Language:Shell 99.1%Language:Makefile 0.9%