ngrok / kubernetes-ingress-controller

The official ngrok Ingress Controller for Kubernetes

Home Page:https://ngrok.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ingress controller "fails open" if modules have issues

euank opened this issue · comments

Let's look at a concrete example: we attempt to apply an IP policy that prevents all traffic from reaching the service, but typo the name.

kind: IPPolicy
apiVersion: ingress.k8s.ngrok.com/v1alpha1
metadata:
  name: test-policy
spec:
  description: "Deny all policy"
  rules:
  - action: "deny"
    cidr: 0.0.0.0/0
    description: "Deny all"
---
kind: NgrokModuleSet
apiVersion: ingress.k8s.ngrok.com/v1alpha1
metadata:
  name: ip-restrictions
modules:
  ipRestriction:
    policies:
    - "test-policy-typo"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: game-2048
  annotations:
    k8s.ngrok.com/modules: ip-restrictions
... (truncated)
<rest below>
Full example
kind: IPPolicy
apiVersion: ingress.k8s.ngrok.com/v1alpha1
metadata:
  name: test-policy
spec:
  description: "Deny all policy"
  rules:
  - action: "deny"
    cidr: 0.0.0.0/0
    description: "Deny all"
---
kind: NgrokModuleSet
apiVersion: ingress.k8s.ngrok.com/v1alpha1
metadata:
  name: ip-restrictions
modules:
  ipRestriction:
    policies:
    - "test-policy-typo"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: game-2048
  annotations:
    k8s.ngrok.com/modules: ip-restrictions
spec:
  ingressClassName: ngrok
  rules:
    - host: euank-test-ingress.ngrok.app
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: game-2048
                port:
                  number: 80
---
apiVersion: v1
kind: Service
metadata:
  name: game-2048
spec:
  ports:
    - name: http
      port: 80
      targetPort: 80
  selector:
    app: game-2048

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: game-2048
spec:
  replicas: 1
  selector:
    matchLabels:
      app: game-2048
  template:
    metadata:
      labels:
        app: game-2048
    spec:
      containers:
        - name: backend
          image: alexwhen/docker-2048
          ports:
            - name: http
              containerPort: 80

If we apply the above manifest (modifying the domain to be one on your account of course), we end up with our service being exposed to the internet with no ip policy at all.

What you think should happen instead

I think that any error configuring modules should fail closed. This is especially true for auth and security related modules (webhook verification, oauth, etc etc), but I think it's true for all modules.

Relevant logs, with some annotation above some of them:

- The below line already sets up the backend, so at this point we're insecure 
  2023-04-13T07:01:06Z    INFO    controllers.https-edge  Creating new route      {"edgeID": "edghts_2OMT5MIOCvhBLRq2fvxxFRiQYzZ", "match": "/", "matchType": "path_prefix", "backendID": "bkdtg_2OMSnKD7ORWfqgtr6fzL1AttBML"}
- It realizes it can't apply the ip policy, but it doesn't know what to do so it just logs
  2023-04-13T07:01:07Z    ERROR   Reconciler error        {"controller": "httpsedge", "controllerGroup": "ingress.k8s.ngrok.com", "controllerKind": "HTTPSEdge", "HTTPSEdge": {"name":"euank-test-ingress-ngrokapp","namespace":"default"},"namespace": "default", "name": "euank-test-ingress-ngrok-app", "reconcileID": "ded08dbc-356d-40c8-8ace-f1ae19fc8e93", "error": "IPPolicy.ingress.k8s.ngrok.com \"test-policy-typo\" not found"}
  2023-04-13T07:01:07Z    INFO    controllers.https-edge  Updating route  {"edgeID": "edghts_2OMT5MIOCvhBLRq2fvxxFRiQYzZ", "match": "/", "matchType": "path_prefix", "backendID": "bkdtg_2OMSnKD7ORWfqgtr6fzL1AttBML"}

A related issue is that the ingress controller, even for well-formed module configuration, creates the tunnel and tunnel group backend first, and then applies other modules.

That means that even when there are no errors in configuration, there's a brief period where a request could race with the ingress controller to access a resource without the modules applied.

I suspect these issues will end up being similar enough they get fixed in the same way, and the one described here is definitely the easier to reproduce one, so I'll leave this one as a comment, and only file a new issue for it if it ends up being something we need to fix separately.

@russorat, chatted with the team today, @nikolay-ngrok will take this on next week.

Also, here's a Slack thread from last week for context: https://ngrok.slack.com/archives/C03KZ43CKQB/p1682991060020809

commented

My understanding from this issue and a simple solution for it today without API changes is:

For creates

We have this order of operations when creating the edge:

Which can lead to invalid states and things being left open. If we did the following instead:

  • Edge
  • Static backend that says 'Initializing edge'
  • Route
  • Modules
    • If valid
      • Switch backend with valid backend
    • If invalid
      • Update message to 'Error initializing edge, invalid configuration'
        • We could either show the error message or show command to get error details, I prefer the latter since less leaky details

This would create the edge but it would not fail open since the thus solving the fail open issue.

For updates

Leveraging the same concept making a temporary stop-gap solution until we address #219 (atomic updates):

  • Swap the backend for a static backend
  • Update the routes and modules
  • Update backend with valid or error depending on success

This does mean there could be downtime between updates, but this is a better option than what we have today.

In Addition

We should also do #233 as it solves the example proposed in this issue reducing the likely-hood of a user getting to a fail-open state.