grafana / synthetic-monitoring-agent

Synthetic Monitoring Agent

Home Page:https://grafana.com/docs/grafana-cloud/how-do-i/synthetic-monitoring/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support non-root container with traceroute checks (securityContext support)

eliihen opened this issue · comments

Hi!

Currently it does not seem possible to have synthetic-monitoring-agent run as a non-root user in Kubernetes when using traceroute checks. It works fine with "normal" HTTP check as non-root, but when you enable a traceroute probe you get the following error:

Error
2023/11/10 20:46:22 Failed to listen to address 0.0.0.0. Msg: listen ip4:icmp 0.0.0.0: socket: operation not permitted.                                                         
panic: Failed to listen to address 0.0.0.0. Msg: listen ip4:icmp 0.0.0.0: socket: operation not permitted.                                                                      

goroutine 129 [running]:
log.Panicf({0xdb11d6?, 0x38?}, {0xc00050e568?, 0x4183e8?, 0xc00050e4b0?})
log/log.go:439 +0x65
github.com/tonobo/mtr/pkg/icmp.SendICMP({0xd8ede8, 0x7}, {0xed5df0, 0xc0002a01b0}, {0x0, 0x0}, 0xc0002055e0?, 0xd67f, 0x7f8504e2e5b8?, 0x74d9)
github.com/tonobo/mtr@v0.1.1-0.20210422192847-1c17592ae70b/pkg/icmp/icmp.go:43 +0x730
github.com/tonobo/mtr/pkg/icmp.SendDiscoverICMP(...)
github.com/tonobo/mtr@v0.1.1-0.20210422192847-1c17592ae70b/pkg/icmp/icmp.go:29
github.com/tonobo/mtr/pkg/mtr.(*MTR).discover(0xc0002055e0, {0xeda2b0, 0xc000205340}, 0x5)
github.com/tonobo/mtr@v0.1.1-0.20210422192847-1c17592ae70b/pkg/mtr/mtr.go:191 +0x405
github.com/tonobo/mtr/pkg/mtr.(*MTR).RunWithContext(...)
github.com/tonobo/mtr@v0.1.1-0.20210422192847-1c17592ae70b/pkg/mtr/mtr.go:153
github.com/grafana/synthetic-monitoring-agent/internal/prober/traceroute.Prober.Probe({{0x5, 0x6fc23ac00, 0x1dcd6500, 0x1, 0x1, 0x40, 0xf, 0x1, {0xece1f8, 0x0}, ...}, ...}, ...
)
github.com/grafana/synthetic-monitoring-agent/internal/prober/traceroute/traceroute.go:87 +0x233
github.com/grafana/synthetic-monitoring-agent/internal/scraper.runProber({0xeda240, 0xc000034190}, {0xed68b8, 0xc0001349c0}, {0xc0004362c0, 0x19}, 0x6fc23ac00, 0x10?, 0xc000133
ad0, {0xed4660, ...})
github.com/grafana/synthetic-monitoring-agent/internal/scraper/scraper.go:476 +0x356
github.com/grafana/synthetic-monitoring-agent/internal/scraper.getProbeMetrics({0xeda240, 0xc000034190}, {0xed68b8, 0xc0001349c0}, {0xc0004362c0, 0x19}, 0x0?, 0x0?, 0xc0003e740
0?, 0xc0001a5290, ...)
github.com/grafana/synthetic-monitoring-agent/internal/scraper/scraper.go:437 +0x16f
github.com/grafana/synthetic-monitoring-agent/internal/scraper.Scraper.collectData({{0xed2a40, 0xc0001c80f0}, 0xc0001bcae0, {0xdaafe8, 0xa}, {0xc0004362c0, 0x19}, {{0xed4c20, 0
xc0000426b0}, 0xff, ...}, ...}, ...)
github.com/grafana/synthetic-monitoring-agent/internal/scraper/scraper.go:380 +0x73d
github.com/grafana/synthetic-monitoring-agent/internal/scraper.(*Scraper).Run.func1({0xeda240, 0xc000034190}, {0xc14bc3878943b300?, 0x176215fb8a?, 0x14d8400?})
github.com/grafana/synthetic-monitoring-agent/internal/scraper/scraper.go:214 +0x105
github.com/grafana/synthetic-monitoring-agent/internal/scraper.tickWithOffset({0xeda240, 0xc000034190}, 0xc000330900, 0xc00050ff88, 0xc00050ff70, 0x0?, 0x1d4c0)
github.com/grafana/synthetic-monitoring-agent/internal/scraper/scraper.go:313 +0x152
github.com/grafana/synthetic-monitoring-agent/internal/scraper.(*Scraper).Run(0xc0000f3180, {0xeda240, 0xc000034190})
github.com/grafana/synthetic-monitoring-agent/internal/scraper/scraper.go:268 +0x145
created by github.com/grafana/synthetic-monitoring-agent/internal/checks.(*Updater).addAndStartScraperWithLock in goroutine 99
github.com/grafana/synthetic-monitoring-agent/internal/checks/checks.go:867 +0x8e5

This error occurs despite setting capabilities in securityContext to add NET_ADMIN and SYS_TIME.

I'm no expert, but it seems that some containers (like BusyBox) need root for traceroute to be able to build packets from scratch (raw sockets).

Thoughts? Is it even possible to fix this issue given that traceroute seems to require root?

Steps to reproduce

  1. Apply the below example as well as a secret named auth-token-secret with the api-token field
  2. Add a normal HTTP check to the probe. See that it works
  3. Add a traceroute check and wait for the check to execute. Monitor the log and see that the entire container crashes due to the above error
Code
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-synthetic-monitoring-agent-1
spec:
  minReadySeconds: 10
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: grafana-synthetic-monitoring-agent-1
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      labels:
        name: grafana-synthetic-monitoring-agent-1
    spec:
      containers:
      - args:
        - /usr/local/bin/synthetic-monitoring-agent --api-server-address=${API_SERVER}
          --api-token=${API_TOKEN} --listen-address=0.0.0.0:4050 --verbose=true
        command:
        - sh
        - -c
        env:
        - name: API_TOKEN
          valueFrom:
            secretKeyRef:
              key: api-token
              name: auth-token-secret
        - name: API_SERVER
          value: synthetic-monitoring-grpc-eu-west.grafana.net:443
        image: grafana/synthetic-monitoring-agent@sha256:a83d5a558b048c19915015511eaa46637124f2c2a55177ac5a37faee41d40fe1
        livenessProbe:
          httpGet:
            path: /
            port: 4050
        name: agent
        ports:
        - containerPort: 4050
          name: http-metrics
        readinessProbe:
          httpGet:
            path: /ready
            port: 4050
        resources:
          limits:
            cpu: 500m
            memory: 750Mi
          requests:
            cpu: 10m
            memory: 50Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_RAW
            - NET_ADMIN
            - SYS_TIME
          privileged: false
          readOnlyRootFilesystem: true
          runAsGroup: 1000
          runAsNonRoot: true
          runAsUser: 1000
      securityContext:
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
        supplementalGroups:
        - 1000