[provider-local] VPN tunnel check succeeds even if VPN is broken
timebertt opened this issue · comments
How to categorize this issue?
/area networking testing
/kind bug
What happened:
In the provider-local HA setup (tested with single-zone but should also apply to multi-zone), kube-apiserver talks directly to the kubelet API instead of using the VPN connection.
With this, operations like kubectl logs
and kubectl port-forward
(for which the kubelet API is called by kube-apiserver) work even if the VPN connection is broken.
As the VPN tunnel check performed by gardenlet uses a port-forward operation (code), the shoot can be reconciled successfully and be marked as healthy even if the VPN connection is broken.
This problem might cause bugs and regressions in the VPN setup to go unnoticed.
E.g., in #9597 there was a problem in the HA VPN configuration (fixed in a later commit).
Nevertheless, most test cases of pull-gardener-e2e-kind-ha-{single,multi}-zone
succeeded. I.e., shoot creations were successful although the VPN connection was never working.
The problem was only discovered by chance in the credentials rotation test case (ref).
What you expected to happen:
If the VPN connection cannot be established successfully
- shoot reconciliations should fail
- shoot status should be set to unhealthy
- e2e tests should fail accordingly
How to reproduce it (as minimally and precisely as possible):
make kind-ha-single-zone-up gardener-ha-single-zone-up
- Apply the following patch to
example/provider-local/shoot.yaml
--- a/example/provider-local/shoot.yaml
+++ b/example/provider-local/shoot.yaml
@@ -8,6 +8,10 @@ metadata:
shoot.gardener.cloud/cloud-config-execution-max-delay-seconds: "0"
authentication.gardener.cloud/issuer: "managed"
spec:
+ controlPlane:
+ highAvailability:
+ failureTolerance:
+ type: node
cloudProfileName: local
secretBindingName: local # dummy, doesn't contain any credentials
region: local
kubectl apply -f example/provider-local/shoot.yaml
- Wait for the shoot to be reconciled successfully and healthy.
- Verify manually that the VPN connection works:
k -n kube-system logs deploy/metrics-server --request-timeout 2s
k -n kube-system port-forward svc/metrics-server 8443:443 --request-timeout 2s
k top no
- Break the VPN connection:
k -n shoot--local--local scale sts vpn-seed-server --replicas 0
- Ensure there are no more open TCP connections from kube-apiserver to kubelet:
k -n shoot--local--local delete po -l role=apiserver
- Repeat the VPN verification from step 5.
logs
andport-forward
work, while connection to the metrics-server (k top no
) doesn't work. - Observe that the shoot status is healthy.
Anything else we need to know?:
This only applies to HA clusters, where routes to the shoot networks are configured explicitly in the kube-apiserver pods.
For non-HA clusters, there is an EgressSelectorConfiguration
that connects to the envoy-proxy
container in the vpn-seed-server
using HTTPConnect
instead of using explicitly configured IP routes.
E.g.:
$ k -n shoot--local--local exec -it deploy/kube-apiserver -c vpn-path-controller -- sh
~ # ip r
default via 169.254.1.1 dev eth0
10.3.0.0/16 via 192.168.123.195 dev bond0 # shoot pod network
10.4.0.0/16 via 192.168.123.195 dev bond0 # shoot service network
192.168.123.0/26 dev tap0 proto kernel scope link src 192.168.123.9 # VPN network
192.168.123.64/26 dev tap1 proto kernel scope link src 192.168.123.72 # VPN network
192.168.123.192/26 dev bond0 proto kernel scope link src 192.168.123.237 # VPN network
169.254.1.1 dev eth0 scope link
~ # ip r get 10.1.54.75 # node IP
10.1.54.75 via 169.254.1.1 dev eth0 src 10.1.178.85 uid 0
cache
Note, that there is no route for the shoot node network. This is because Shoot.spec.networking.nodes
is empty, as is overlaps with Seed.spec.networks.pods
(provider-local starts pods in the seed as shoot nodes).
Hence, kube-apiserver can talk directly to the kubelet API via the seed pod network.
There are even multiple mechanisms for allowing this direct communication path from kube-apiserver to kubelet:
allow-machine-pods
NetworkPolicy
:gardener/pkg/provider-local/controller/infrastructure/actuator.go
Lines 61 to 65 in 70fe495
machines
Service
: https://github.com/gardener/machine-controller-manager-provider-local/blob/aa28b3aede72b45440183187c23db89ea76840d5/pkg/local/create_machine.go#L67-L86- webhook for adding
networking.resources.gardener.cloud/to-machines-tcp-10250=allowed
label tokube-apiserver
:
To verify that kube-apiserver of local HA shoots talks directly to the kubelet API, use the following steps:
- Create a HA shoot. Wait for the shoot to be reconciled successfully and healthy.
k -n shoot--local--local delete netpol allow-machine-pods
k -n shoot--local--local delete svc machines
- Ensure there are no more open TCP connections from kube-apiserver to kubelet:
k -n shoot--local--local delete po -l role=apiserver
- Repeat the VPN verification from step 5 above.
logs
andport-forward
don't work (don't use the VPN connection), while connection to the metrics-server (k top no
) works (uses the working VPN connection). - Observe that the shoot status is unhealthy because the
port-forward
operation doesn't work.
Environment:
- Gardener version:
v1.93.0-dev
This issue does also affect non-HA scenarios in the local setup. As there is no node range defined for shoots in the local setup the network connectivity will be the following for the VPN check in the reconciliation:
kube-apiserver -> envoy proxy container of vpn-seed-server pod -> machine pod via seed cluster network -> kubelet
In real scenarios, it should be like this:
kube-apiserver -> envoy proxy container of vpn-seed-server pod -> local route to vpn device created by vpn-seed-server container in same pod -> vpn-shoot -> actual node -> kubelet
Fixing this can prevent regressions, but there are also validations in place preventing shoot/seed network overlaps, which may make this somewhat challenging.
You're right. In the non-HA scenario, kube-apiserver will always connect to vpn-seed-server because of the EgressSelectorConfiguration
. This one, however, routes only pod and service IPs via the VPN, but nodes are routed via the seed's pod network (again, because Shoot.spec.networking.nodes
is empty):
$ k -n shoot--local--local exec -it deploy/vpn-seed-server -c vpn-seed-server -- sh
~ # ip r
default via 169.254.1.1 dev eth0
10.3.0.0/16 via 192.168.123.2 dev tun0
10.4.0.0/16 via 192.168.123.2 dev tun0
169.254.1.1 dev eth0 scope link
192.168.123.0/24 dev tun0 proto kernel scope link src 192.168.123.1
~ # ip r get 10.1.130.210 # node IP
10.1.130.210 via 169.254.1.1 dev eth0 src 10.1.131.18 uid 0
cache
We can verify this route by breaking the VPN connection on the shoot-side this time:
k -n shoot--local--local annotate mr shoot-core-vpn-shoot resources.gardener.cloud/ignore=true
k -n kube-system scale deploy vpn-shoot --replicas 0
- Ensure there are no more open TCP connections from kube-apiserver to kubelet:
k -n shoot--local--local delete po -l role=apiserver
kubectl logs
andkubectl port-forward
still works (direct route to kubelet API without VPN) butk top no
is broken as services are routed through the VPN.
To summarize:
Problem
In the provider-local setup, the VPN tunnel check performed by gardenlet (port-forward check) does not detect a broken VPN tunnel, because either kube-apiserver (HA clusters) or vpn-seed-server (non-HA clusters) route requests to the kubelet API directly via the seed's pod network.
When the VPN connection is broken, kubectl port-forward
and kubectl logs
continue to work, while k top no
(APIServices
, Webhooks
, etc.) is broken.
We should strive towards resolving this discrepancy between the local setup and cloud setups regarding the VPN connection to prevent bugs by validating the real setup in e2e tests.
Proposal
1. Set Shoot.spec.networking.nodes
Setting Shoot.spec.networking.nodes
ensures the VPN configures routes for the node network via the VPN tunnel.
Though, this network must not overlap with Seed.spec.networks.pods
.
Right now, the Seed.spec.networks.pods
field is only used for API validation to prevent obvious misconfigurations. I don't see a problem if the seed pod network is larger than what is configured in Seed.spec.networks.pods
.
Hence, we could split the seed pod network into one default subnet (configured in Seed.spec.networks.pods
) and add a dedicated calico IPPool
for machine pods (configured in Seed.spec.networks.shootDefaults.pods
-> Shoot.spec.networking.pods
).
With this, the IP packets would still be routable to and between machine pods, but we would have disjoint networks to configure in the API objects and with this correct routes via the VPN tunnel.
2. Forbid direct communication of seed components with machine pods
We want to ensure that the VPN tunnel checks only succeed if the VPN is successfully established.
For this, we need to drop all NetworkPolicies
allowing communication from seed components to machine pods.
E.g., when the VPN connection is broken in HA clusters, there will be no route to the shoot node network and hence packets will be routed via the seed pod network.
Additional Improvements
We might also consider augmenting gardenlet's tunnel check. In addition to testing an operation that talks to the node network for the kubelet API (e.g., port-forward), it could also test an operation targeting the pod/service network (e.g., metrics server).
This doesn't resolve the discrepancy in setups but might be useful in general to detect more failure cases in shoot health checks.
WDYT?
This problem might cause bugs and regressions in the VPN setup to go unnoticed.
E.g., in #9597 there was a problem in the HA VPN configuration (fixed in a later commit).
Nevertheless, most test cases of pull-gardener-e2e-kind-ha-{single,multi}-zone succeeded. I.e., shoot creations were successful although the VPN connection was never working.
The problem was only discovered by chance in the credentials rotation test case (#9597 (comment)).
@timebertt we have this exact thing happening on latest using provider-openstack
i can add some logs if it helps.
@Lappihuan this issue is only about provider-local, so provider-openstack is out of scope for this issue.
Please open a new issue for your case if necessary.
/assign @rfranzke @timebertt