aws / aws-network-policy-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Network Policy Option Cannot Disable

atilsensalduz opened this issue · comments

Hey team,

I encountered an error when attempting to set the 'enableNetworkPolicy' parameter to false. Could you please guide me on the correct procedure to disable network policies?

Error logs on pod:

{"level":"info","ts":"2023-11-09T12:32:26.065Z","caller":"runtime/asm_arm64.s:1197","msg":"version","GitVersion":"","GitCommit":"","BuildDate":""}
2023-11-09 12:32:26.067401356 +0000 UTC Logger.check error: failed to get caller
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xaaaae21acddc]

goroutine 84 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:116 +0x1a4
panic({0xaaaae2e658e0?, 0xaaaae4298440?})
	/root/sdk/go1.21.3/src/runtime/panic.go:914 +0x218
github.com/aws/aws-network-policy-agent/controllers.(*PolicyEndpointsReconciler).configureeBPFProbes(0x4000150540, {0xaaaae31efe58, 0x40009250b0}, {0x4000c07ec0, 0x25}, {0x4000936aa0?, 0x1, 0x0?}, {0x400092bd80, 0x1, ...}, ...)
	/workspace/controllers/policyendpoints_controller.go:258 +0x34c
github.com/aws/aws-network-policy-agent/controllers.(*PolicyEndpointsReconciler).reconcilePolicyEndpoint(0x4000150540, {0xaaaae31efe58, 0x40009250b0}, 0x400093e9c0)
	/workspace/controllers/policyendpoints_controller.go:232 +0x550
github.com/aws/aws-network-policy-agent/controllers.(*PolicyEndpointsReconciler).reconcile(0x4000150540, {0xaaaae31efe58, 0x40009250b0}, {{{0x40004f5600, 0xc}, {0x400015c858, 0x15}}})
	/workspace/controllers/policyendpoints_controller.go:149 +0x1a4
github.com/aws/aws-network-policy-agent/controllers.(*PolicyEndpointsReconciler).Reconcile(0x4000150540, {0xaaaae31efe58, 0x40009250b0}, {{{0x40004f5600, 0xc}, {0x400015c858, 0x15}}})
	/workspace/controllers/policyendpoints_controller.go:130 +0xe4
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xaaaae31f2028?, {0xaaaae31efe58?, 0x40009250b0?}, {{{0x40004f5600?, 0xb?}, {0x400015c858?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119 +0x8c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x4000130500, {0xaaaae31efe90, 0x4000510c80}, {0xaaaae2f59640?, 0x4000d80f20?})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316 +0x2e4
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x4000130500, {0xaaaae31efe90, 0x4000510c80})
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 +0x198
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 +0x74
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 98
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:223 +0x43c

Thanks!

If you are disabling Network Policy feature in VPC CNI we need to delete the Network Policies that VPC CNI and the Network Policy controller already reconciled on and this will give the components a chance to go remove the BPF probes attached to the pods. We shouldn't disable the feature with active PolicyEndpoint resources as this will leave the pods with active policies in a bad state.

Here we are trying to disable the feature but there are active resources to reconcile on. We wanted to fail hard in this scenario to avoid running in to issues with stale probes if the feature is enabled again on the same cluster.

To disable -

  1. Delete all NP resources
  2. Set enable-network-policy-controller to false in ConfigMap amazon-vpc-cni (kube-system NS). This will disable the controller.
  3. Set the 'enableNetworkPolicy' parameter to false. This will disable the agents on the nodes.

thank you so much for explanation 🙏 @jayanthvn

Hi, we got this exception as well. Could the argument --enable-network-policy=false also disable reconciliation (not only disable the creation of the client)?
https://github.com/aws/aws-network-policy-agent/blob/v1.0.6/controllers/policyendpoints_controller.go#L149

commented
  • Delete all NP resources

  • Set enable-network-policy-controller to false in ConfigMap amazon-vpc-cni (kube-system NS). This will disable the controller.

  • Set the 'enableNetworkPolicy' parameter to false. This will disable the agents on the nodes.

Hey @jayanthvn - I realized I asked this somewhat before but - do you also need to patch out the node-agent container that exists in the daemonset after you've followed these steps?

I don't currently have a fresh cluster to install the VPC CNI on, to then test enabling the network policy flag, but I was under the impression that until you enable the { "enableNetworkPolicy": "true" } flag on the addon that container doesn't exist on the aws-node deployment?

i.e. if we think the node-agent is causing issues on the host machine, having it running in the pod could still be causing issues even after we've followed the steps above?