when scaling up cluster and 2 or more nodes are starting in the same time there is a race condition

Question

when scaling up cluster and 2 or more nodes are starting in the same time there is a race condition

Tchirana opened this issue 6 months ago · comments

When 2 or more nodes are starting in the same time a race conditions occurs from time to time and in some cases ips do not get assigned to nodes. You should watch first is a AddInstanceAddress request exists and after that start a new asignament

Alexei Ledenev commented 2 months ago

fixed

Tchirana · Answer 1 · Wed Dec 06 2023 22:23:25 GMT+0800 (China Standard Time)

here is a snip from 2 nodes starting in the same time:
kubeip-f26p9 kubeip time="2023-12-06T14:20:15Z" level=debug msg="found 8 available addresses"
kubeip-mr6kt kubeip time="2023-12-06T14:20:18Z" level=debug msg="found 8 available addresses"

AND:

kubeip-g9mxj kubeip time="2023-12-06T14:20:59Z" level=error msg="failed to assign static public IP address xx.xx.xx.xx" func="github.com/doitintl/kubeip/internal/address.(*gcpAssigner).Assign" file="/app/internal/address/gcp.go:250" error="address is already assigned" version=sha-ce43fbb
kubeip-g9mxj kubeip time="2023-12-06T14:20:59Z" level=info msg="adding public IP address to instance" func="github.com/doitintl/kubeip/internal/address.(*gcpAssigner).AddInstanceAddress" file="/app/internal/address/gcp.go:188" accessConfig="&{ 0 compute#accessConfig External IP yy.yy.yy.yy false ONE_TO_ONE_NAT [] []}" version=sha-ce43fbb

Alexei Ledenev · Answer 2 · Thu Mar 28 2024 00:14:27 GMT+0800 (China Standard Time)

@Tchirana thank you for reporting

Need to implement distributed Mutex, planned in the future. Currently added a random 10s-max sleep before listing available IP addresses. Supposed to reduce conflicts a bit.