Usage of IPs returned by `InstancesV2().InstanceMetadata()`, and interaction with `--node-ip`

Question

Usage of IPs returned by `InstancesV2().InstanceMetadata()`, and interaction with `--node-ip`

deitch opened this issue 2 years ago · comments

I am trying to clarify several related issues.

First, if an external CCM returns multiple addresses of one type (e.g. 2 InternalIP or 2 ExternalIP), which ones become "the" IPs used for the node resource? Some comments in #56 imply that it is sorted order, i.e. first one of a given type, but is that actually so? Is it documented anywhere?

Second, the description in #56, and the code comment here say that, essentially, --node-ip is an override. So if I provide --node-ip, then whether that IP appears in InstanceMetadata()'s response or not, that will be the node's address.

How does it determine if --node-ip is public or private? The description in kubelet reference does not say which one it is, or how it determines.

Finally, the code implies that it does not actually use --node-ip unless it also appears in InstanceMetadata().

	enforcedNodeAddresses := []v1.NodeAddress{}

	nodeIPTypes := make(map[v1.NodeAddressType]bool)
	for _, nodeAddress := range cloudNodeAddresses {
		if netutils.ParseIPSloppy(nodeAddress.Address).Equal(nodeIP) {
			enforcedNodeAddresses = append(enforcedNodeAddresses, v1.NodeAddress{Type: nodeAddress.Type, Address: nodeAddress.Address})
			nodeIPTypes[nodeAddress.Type] = true
		}
	}

	// nodeIP must be among the addresses supplied by the cloud provider
	if len(enforcedNodeAddresses) == 0 {
		return nil, fmt.Errorf("failed to get node address from cloud provider that matches ip: %v", nodeIP)
	}

It only would use --node-ip address if it also matches one provided by the CCM.

I have not tried that quite yet, as I am not sure I am understanding it correctly.

Avi Deitcher · Answer 1 · Sun Jun 19 2022 19:29:15 GMT+0800 (China Standard Time)

Anyone have insight into these issues?

Ole Markus With · Answer 2 · Sun Jun 19 2022 19:40:21 GMT+0800 (China Standard Time)

If kubelet is configured to use --node-ip, this will end up in an annotation being set on the Node object that kubelet creates.
CCM creates a list of eligible IPs that the Node object can use. If the IP in the annotation is also in the list of IPs from CCM, this IP becomes the "primary" IP of the Node object.

Also note that Node IPs are either internal or external. This is not synonymous with private and public. Internal means used for intra-cluster communication, and can (often) be public IPs. For example when IPv6 is used, some CCMs always classify these as Internal.

Avi Deitcher · Answer 3 · Sun Jun 19 2022 19:59:08 GMT+0800 (China Standard Time)

Thanks for hopping in @olemarkus

Yes, I definitely get the private/public vs internal/external distinctions, but it is good to have them clearly here on this issue.

So to summarize. Setting node-ip has no impact on the node IP itself. What it does do, is add an annotation. The CCM, upon receiving the v1.Node object, can check for and use that annotation. The actual IPs used are those provided by the CCM, no matter what node-ip said.

When you wrote "list of eligible IPs", how does kubernetes decide, from that list, "this is the one I will use for internal comms, which is the one listed when I do kubectl get node? I'm asking both with setting node-ip and without? It's obvious when there's just one internal IP, but if there is a list?

Ole Markus With · Answer 4 · Sun Jun 19 2022 20:17:48 GMT+0800 (China Standard Time)

Assuming an external CCM is used, then no, setting --node-ip doesn't really provide any guarantees. It is ultimately CCM that decides what IPs to use. An external CCM may even not use this library and provide their own logic.

"List of eligible IPs" is whatever is returned by the NodeAddressesByProviderID() function that is a part of the CCM Instance interface. Usually it involves calls to the relevant cloud API.

Avi Deitcher · Answer 5 · Sun Jun 19 2022 20:40:39 GMT+0800 (China Standard Time)

"List of eligible IPs" is whatever is returned by the NodeAddressesByProviderID() function that is a part of the CCM Instance interface. Usually it involves calls to the relevant cloud API

Sure, totally got that. When it gets returned, how does kubernetes decide which IP to use? Is it first in the list?

Avi Deitcher · Answer 6 · Mon Jun 20 2022 15:52:49 GMT+0800 (China Standard Time)

Also, is it this annotation, which is alpha? Or some other?

Ole Markus With · Answer 7 · Tue Jun 21 2022 21:24:16 GMT+0800 (China Standard Time)

The first entry is the primary node IP, yes. And yes, that is the correct annotation.

Avi Deitcher · Answer 8 · Tue Jun 21 2022 22:43:30 GMT+0800 (China Standard Time)

When I pass --node-ip to the kubelet, does the kubelet actually validate that it is a valid address? Or does it pass it on to the CCM as is? If I passed in --node-ip="Ole Markus is Helpful", will the kubelet catch that and error it out? Or is it up to the CCM to do so?

Avi Deitcher · Answer 9 · Tue Jun 21 2022 22:46:45 GMT+0800 (China Standard Time)

Hmm, while we are at it, is there any way for a user to provide an external IP as well? Or is the --node-ip, and how it gets passed to CCM, expected to be internal (although the CLI flag and annotation do not say so, just "provided")?

Ole Markus With · Answer 10 · Wed Jun 22 2022 02:38:16 GMT+0800 (China Standard Time)

It will certainly be validated. Kubelet also uses the argument for what IPs to listen to on the kubelet server. Using an external IP probably works. But keep in mind that the control plane and a few pods typically need to reach the kubelet API.

Kubernetes Triage Robot · Answer 11 · Tue Sep 20 2022 03:14:47 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Kubernetes Triage Robot · Answer 12 · Thu Oct 20 2022 03:39:51 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Kubernetes Triage Robot · Answer 13 · Sat Nov 19 2022 04:12:23 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Kubernetes Prow Robot · Answer 14 · Sat Nov 19 2022 04:12:28 GMT+0800 (China Standard Time)

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.