kubernetes-sigs / cluster-api-provider-openstack

Cluster API implementation for OpenStack

Home Page:https://cluster-api-openstack.sigs.k8s.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect FloatingIP workflow

serge-name opened this issue · comments

/kind bug

What steps did you take and what happened:
I tried capo build for 1d5d2d5e45462dab056e37a6c948361e81875ea9. Some key details follow:

  1. Created a OpenStackFloatingIPPool (non-relevant fields removed)
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: OpenStackFloatingIPPool
metadata:
  name: osfipp
spec:
  floatingIPNetwork:
    id: c7c8509d-7083-41c9-b799-e30e855e9bc0
  reclaimPolicy: Delete
  #
  1. created a MachineDeployment and OpenStackMachineTemplate
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OpenStackMachineTemplate
metadata:
  name: some
spec:
  template:
    spec:
      ports:
        - network:
            id: f16855bf-8ba1-4f75-ad8c-763e80134571
      floatingIPPoolRef:
        apiGroup: infrastructure.cluster.x-k8s.io/v1beta1
        kind: OpenStackFloatingIPPool
        name: osfipp
#

✅ Floating IP was successfully created. Here we get correct data fip.FloatingIP == "185.***.**.**", fip.FloatingNetworkID == "c7c8509d-7083-41c9-b799-e30e855e9bc0":

fip, err := networkingService.GetFloatingIP(address.Spec.Address)
if err != nil {
return err
}

❌ Here we get port == nil and an error "Failed while associating ip from pool: port for floating IP "185...*" on network c7c8509d-7083-41c9-b799-e30e855e9bc0 does not exist":

port, err := networkingService.GetPortForExternalNetwork(instanceStatus.ID(), fip.FloatingNetworkID)
if err != nil {
return fmt.Errorf("get port for floating IP %q: %w", fip.FloatingIP, err)
}
if port == nil {
conditions.MarkFalse(openStackMachine, infrav1.FloatingAddressFromPoolReadyCondition, infrav1.FloatingAddressFromPoolErrorReason, clusterv1.ConditionSeverityError, "Can't find port for floating IP %q on external network %s", fip.FloatingIP, fip.FloatingNetworkID)
return fmt.Errorf("port for floating IP %q on network %s does not exist", fip.FloatingIP, fip.FloatingNetworkID)
}

More details follow.

Here:

instancePorts, err := s.client.ListPort(instancePortsOpts)

Openstack API returns the following (non-relevant fields skipped):

{
  "ports": [
    {
      "device_id": "d1b99e45-991c-4143-93a3-9a8d3eddb416",
      "device_owner": "compute:nova",
      "fixed_ips": [
        {
          "ip_address": "10.21.10.29",
          "subnet_id": "616388c0-519f-418e-80b4-3687a546a65e"
        }
      ],
      "id": "0d1fe3bd-55f6-41d0-b879-a4071a15b5c0",
      "network_id": "f16855bf-8ba1-4f75-ad8c-763e80134571"
// …
    }
  ]
}

Please notice that we don't have a port associated with FIP network c7c8509d-7083-41c9-b799-e30e855e9bc0. And both FIP network ID and the FIP itself are not going to appear in the ports info because in our Openstack cloud floating IPs are not being added to ports directly. But NAT 185.***.**.**10.21.10.29 would be set up.

If the new k8s node got FIP it could be found here:
https://compute-api:8774/v2.1/TENANT_ID/servers/d1b99e45-991c-4143-93a3-9a8d3eddb416

And the reply might be looking like this (non-relevant fields skipped):

{ "server": {
    "id": "d1b99e45-991c-4143-93a3-9a8d3eddb416",
    "hci_info": {
      "network": [
        {
          "ips": [
            "10.21.10.29"
          ],
          "network": {
            "id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
            "subnets": [
              {
                "ips": [
                  {
                    "address": "10.21.10.29",
                    "type": "fixed",
                    "version": 4,
                    "floating_ips": [
                      {
                        "address": "185.***.**.**",
                        "type": "floating",
                        "version": 4,
                      }
                    ]
                  } ] } ] } } ] } } }

Here it tries to find a fixed IP in the FIP network but in our openstack cloud all FIPs have device_owner == "network:floatingip" so it gets just an empty list:

networkPortsOpts := ports.ListOpts{
NetworkID: instancePort.NetworkID,
DeviceOwner: "network:router_interface",
}
networkPorts, err := s.client.ListPort(networkPortsOpts)

What did you expect to happen:
Successfully deployed k8s node with FIP attached.

Anything else you would like to add:
None so far. But please ask me any details. The issue is reproducible and I can add even more details if you want.

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): 1d5d2d5e45462dab056e37a6c948361e81875ea9

  • Cluster-API version: 1.6.3

  • OpenStack version: Virtuozzo (https://virtuozzo.com), based on Openstack Xena

  • Minikube/KIND version: N/A

  • Kubernetes version (use kubectl version): 1.29.3

  • OS (e.g. from /etc/os-release): Talos (https://talos.dev) 1.6.7

What does f16855bf-8ba1-4f75-ad8c-763e80134571 look like, does it have a router?

It's not really documented, but we don't create any new ports for the FIPs, we just look for an existing port that the FIP can be attached to by checking if there's a port with a subnet that has an attached router to the floating ip network.

I've mostly tested it out with spec.ports omitted with the default setup, but I can test it out with something closer to your setup if I know more about how that network is setup.

Yes, I meant that the new port is being created by Openstack. But not in our cloud. I'm not so familiar with Openstack internals and don't have an access to different configurations except our particular cloud.

GET https://compute-api:9696/v2.0/networks/f16855bf-8ba1-4f75-ad8c-763e80134571
{
  "network": {
    "id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
    "name": "internal",
    "tenant_id": "278fda03174b4fee9358559baffca010",
    "admin_state_up": true,
    "mtu": 8913,
    "default_vnic_type": null,
    "status": "ACTIVE",
    "subnets": [
      "616388c0-519f-418e-80b4-3687a546a65e"
    ],
    "shared": false,
    "availability_zone_hints": [],
    "availability_zones": [
      "nova"
    ],
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "router:external": false,
    "description": "",
    "port_security_enabled": true,
    "rbac_policies": [
      {
        "id": "c869c7ef-3c51-4fb6-88f5-c591989fe3ef",
        "action": "access_as_shared",
        "target_tenant": "d278dea8631e47ffba5a908265968fbb"
      }
    ],
    "qos_policy_id": null,
    "tags": [],
    "created_at": "2024-02-06T12:43:10Z",
    "updated_at": "2024-03-20T20:39:09Z",
    "revision_number": 5,
    "project_id": "278fda03174b4fee9358559baffca010",
    "provider:network_type": "vxlan"
  }
}
GET https://compute-api:9696/v2.0/routers/7142d8f1-2b11-4ae2-a343-eacd77a2ceee
{
  "router": {
    "id": "7142d8f1-2b11-4ae2-a343-eacd77a2ceee",
    "name": "DefaultRouter",
    "tenant_id": "278fda03174b4fee9358559baffca010",
    "admin_state_up": true,
    "status": "ACTIVE",
    "external_gateway_info": {
      "network_id": "c7c8509d-7083-41c9-b799-e30e855e9bc0",
      "external_fixed_ips": [
        {
          "subnet_id": "aa2bc8f7-fa02-4851-ba13-93e57d4c69e1",
          "ip_address": "69.**.**.**"
        }
      ],
      "enable_snat": true
    },
    "description": "",
    "availability_zones": [
      "nova"
    ],
    "availability_zone_hints": [],
    "routes": [
    ],
    "flavor_id": null,
    "tags": [],
    "created_at": "2024-02-06T11:49:58Z",
    "updated_at": "2024-03-29T14:41:39Z",
    "revision_number": 17,
    "project_id": "278fda03174b4fee9358559baffca010"
  }
}

That router's external_fixed_ips is automatically pre-created by Openstack.

If a VM has FIP attached then outgoing connections are being SNAT'ed from that FIP.
IF a VM has no FIP then connections are being SNAT'ed from the router's external IP.

GET https://compute-api:9696/v2.0/ports?device_id=7142d8f1-2b11-4ae2-a343-eacd77a2ceee
{
  "ports": [
    {
      "id": "0411af2f-d447-4f3c-88a7-1e8a57e70015",
      "name": "",
      "network_id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
      "tenant_id": "",
      "mac_address": "fa:16:3e:44:38:7e",
      "admin_state_up": true,
      "status": "ACTIVE",
      "device_id": "7142d8f1-2b11-4ae2-a343-eacd77a2ceee",
      "device_owner": "network:router_centralized_snat",
      "fixed_ips": [
        {
          "subnet_id": "616388c0-519f-418e-80b4-3687a546a65e",
          "ip_address": "10.21.11.1"
        }
      ],
      "allowed_address_pairs": [],
      "extra_dhcp_opts": [],
      "security_groups": [],
      "description": "",
      "binding:vnic_type": "normal",
      "port_security_enabled": false,
      "qos_policy_id": null,
      "qos_network_policy_id": null,
      "tags": [],
      "created_at": "2024-02-06T14:02:02Z",
      "updated_at": "2024-03-23T18:11:57Z",
      "revision_number": 40,
      "project_id": ""
    },
    {
      "id": "ded9eafe-3ee0-4f29-9f7f-953470f3a3ae",
      "name": "",
      "network_id": "f16855bf-8ba1-4f75-ad8c-763e80134571",
      "tenant_id": "278fda03174b4fee9358559baffca010",
      "mac_address": "fa:16:3e:48:d2:da",
      "admin_state_up": true,
      "status": "ACTIVE",
      "device_id": "7142d8f1-2b11-4ae2-a343-eacd77a2ceee",
      "device_owner": "network:router_interface_distributed",
      "fixed_ips": [
        {
          "subnet_id": "616388c0-519f-418e-80b4-3687a546a65e",
          "ip_address": "10.21.10.1"
        }
      ],
      "allowed_address_pairs": [],
      "extra_dhcp_opts": [],
      "security_groups": [],
      "description": "",
      "binding:vnic_type": "normal",
      "port_security_enabled": false,
      "qos_policy_id": null,
      "qos_network_policy_id": null,
      "tags": [],
      "created_at": "2024-02-06T14:02:02Z",
      "updated_at": "2024-04-02T10:33:28Z",
      "revision_number": 68,
      "project_id": "278fda03174b4fee9358559baffca010"
    }
  ]
}

I've came up with a quick fix already: serge-name@bb19917 works fine so far. Right now I am short in time to create a decent PR.

DeviceOwner: "network:router_interface",

Does it work for you if you replace network:router_interface with network:router_interface_distributed?

Yes, network:router_interface_distributed works absolutely fine. As it is in the commit serge-name@a1bf5b8

@bilbobrovall thanks a lot! Your commit elastx@ce38e8b works fine for me and fixes the issue.

There are several minor errors due to premature and frequent (8 API reqs in 2 seconds) checks for FIP. Not a problem for me, just a thing that can be improved later. Logs are follow:

minor_errors.txt

@bilbobrovall thanks a lot! Your commit elastx@ce38e8b works fine for me and fixes the issue.

There are several minor errors due to premature and frequent (8 API reqs in 2 seconds) checks for FIP. Not a problem for me, just a thing that can be improved later. Logs are follow:

minor_errors.txt

👍 It's probably just neutron taking some time, and I think the retries should be fine for now since there's an exponential backoff when a reconciler returns the same error, but the initial retries feels a bit tight in this case.