Kubernetes AWS problems with multiple security groups due to tags
namliz opened this issue · comments
kubernetes/kubernetes#23339, kubernetes/kubernetes#26787
The Kubernetes Controller manages AWS resources by filtering on aws resource tags like KubernetesCluster:ClusterName
. Unfortunately it does this inconsistently for different things.
8527 2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
3961 2292 aws_loadbalancer.go:191] Deleting removed load balancer listeners
4035 2292 log_handler.go:33] AWS request: elasticloadbalancing DeleteLoadBalancerListeners
1501 2292 aws_loadbalancer.go:203] Creating added load balancer listeners
1592 2292 log_handler.go:33] AWS request: elasticloadbalancing CreateLoadBalancerListeners
3129 2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancerAttributes
3214 2292 log_handler.go:33] AWS request: elasticloadbalancing ModifyLoadBalancerAttributes
4591 2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
9882 2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
1322 2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
8421 2292 aws.go:2731] Error opening ingress rules for the load balancer to the instances: Multiple tagged security groups found for instance i-04bd9c4c8aa; ensure only the k8s security group is tagged
8469 2292 servicecontroller.go:754] Failed to process service. Retrying in 5m0s: Failed to create load balancer for service default/pushgateway: Mutiple tagged security groups found for instance i-04bd9c4c8aa36270e; ensure only the k8s security group is tagged
8480 2292 servicecontroller.go:724] Finished syncing service "default/pushgateway" (419.263237ms)
lines 201-224
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L2783
// Returns the first security group for an instance, or nil
// We only create instances with one security group, so we don't expect multiple security groups.
// However, if there are multiple security groups, we will choose the one tagged with our cluster filter.
// Otherwise we will return an error.
The security groups in my case are:
k8s-minions-cncfdemo, k8s-masters-cncfdemo
They are both tagged with the cluster filter. Not expecting multiple security groups seems like a wrong (not to mentioned undocumented!) assumption.
Bit of a head scratcher.
Untagging k8s-masters-cncfdemo
triggers the following events:
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.741311 2292 aws.go:2928] Adding rule for traffic from the load balancer (sg-4c8ee935) to instances (sg-fcbdd985)
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.741361 2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.793955 2292 aws.go:2002] Existing security group ingress: sg-fcbdd985 [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-d28becab",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fabdd983",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fcbdd985",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: } {
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: FromPort: 22,
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "tcp",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpRanges: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: CidrIp: "0.0.0.0/0"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }],
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: ToPort: 22
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794013 2292 aws.go:1874] Comparing {
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-4c8ee935"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: } to {
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-d28becab",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fabdd983",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: },{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-fcbdd985",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserId: "750548967590"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794051 2292 aws.go:1904] Comparing sg-4c8ee935 to sg-d28becab
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794057 2292 aws.go:1904] Comparing sg-4c8ee935 to sg-fabdd983
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794061 2292 aws.go:1904] Comparing sg-4c8ee935 to sg-fcbdd985
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794067 2292 aws.go:2030] Adding security group ingress: sg-fcbdd985 [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: IpProtocol: "-1",
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: UserIdGroupPairs: [{
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: GroupId: "sg-4c8ee935"
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: }]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.794147 2292 log_handler.go:33] AWS request: ec2 AuthorizeSecurityGroupIngress
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.897583 2292 aws.go:3146] Returning cached instances for map[ip-172-20-0-127.us-west-2.compute.internal:{} ip-172-20-0-231.us-west-2.compute.internal:{} ip-172-20-0-232.us-west-2.compute.internal:{}]
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.897657 2292 log_handler.go:33] AWS request: elasticloadbalancing DescribeLoadBalancers
Oct 25 09:26:55 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:55.929252 2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
Oct 25 09:26:56 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:56.100143 2292 log_handler.go:33] AWS request: ec2 DescribeSecurityGroups
Oct 25 09:26:56 ip-172-20-0-34 kube-controller-manager[2292]: I1025 09:26:56.374814 2292 reflector.go:284] pkg/controller/endpoint/endpoints_controller.go:157: forcing resync
This solves the problem (!) -- ELB's picked up the instances because the tag filtering didn't get confused and there's no external routes added to some services I deployed.
This should really be documented somewhere!
The other problem would be whether or not a cluster would standup cleanly because some other tag filtering code might do things slightly differently and actually need the k8s-masters-cncfdemo security group tagged.
Finally, I strongly think this is incorrect behaviour.