kubernetes-sigs / aws-ebs-csi-driver

CSI driver for Amazon EBS https://aws.amazon.com/ebs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metrics for EBS CSI Controller is not available

ujala-singh opened this issue · comments

What happened?
I am using EBS CSI Drivers as an add-on from AWS EKS Clusters. Somehow by providing some custom values, I am able to expose the port 3301 and metrics endpoint, but while scraping it doesn't give any metrics.

What you expected to happen?
I would expect, it should return all of the metrics which are exposed by the application.

Anything else we need to know?:
Added one arg to ebs-plugin conatiner: --http-endpoint=0.0.0.0:3301 and exposed the port:

- name: metrics
  containerPort: 3301
  protocol: TCP

Later created the service and serviceMonitor for the same as mentioned [here]:

---
apiVersion: v1
kind: Service
metadata:
  name: ebs-csi-controller
  namespace: kube-system
  labels:
    app: ebs-csi-controller
spec:
  selector:
    app: ebs-csi-controller
  ports:
    - name: metrics
      port: 3301
      targetPort: 3301
  type: ClusterIP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ebs-csi-controller
  namespace: kube-system
  labels:
    app: ebs-csi-controller
spec:
  selector:
    matchLabels:
      app: ebs-csi-controller
  namespaceSelector:
    matchNames:
      - kube-system
  endpoints:
    - targetPort: 3301
      path: /metrics
      interval: 15s

Vmagent is able to discover the target but there are no metrics available.

Environment

  • Kubernetes version: v1.26
  • Driver version: v1.26.0-eksbuild.1

Logs

$ k logs -f ebs-csi-controller-647db6b5db-k8lxb -n kube-system                                             
Defaulted container "ebs-plugin" out of: ebs-plugin, csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, liveness-probe
I0329 08:39:43.740880       1 driver.go:80] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.26.0"
I0329 08:39:43.741381       1 metrics.go:95] "Metric server listening" address="0.0.0.0:3301" path="/metrics"
I0329 08:39:43.746232       1 controller.go:92] "batching" status=true

Debugging

$ k port-forward svc/ebs-csi-controller 3301:3301 -n kube-system                           
Forwarding from 127.0.0.1:3301 -> 3301
Forwarding from [::1]:3301 -> 3301
Handling connection for 3301
Handling connection for 3301

$ curl http://localhost:3301/metrics

Curl Request is not giving any metrics.

Hi @ujala-singh

I was unable to reproduce this in a similar environment. Could you run through the steps below to check if the metrics endpoint is accessible within the container?

  1. Dynamically provision a volume: kubectl apply -f examples/kubernetes/dynamic-provisioning/manifests
  2. Retrieve leader controller pod: export EBS_CSI_CONTROLLER=$(kubectl get lease -n kube-system external-attacher-leader-ebs-csi-aws-com -o jsonpath="{.spec.holderIdentity}")
  3. Start an ephemeral debug container for the controller pod: kubectl debug -it $EBS_CSI_CONTROLLER -n kube-system --image=busybox:1.28 --target=ebs-plugin (this is necessary due to the use of a minimal base image)
  4. Check if the metrics endpoint is accessible: wget -O - http://localhost:3301/metrics

Here is an example of the expected output:

Connecting to localhost:3301 (127.0.0.1:3301)
# HELP cloudprovider_aws_api_request_duration_seconds [ALPHA] ebs_csi_aws_com metric
# TYPE cloudprovider_aws_api_request_duration_seconds histogram
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.25"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="0.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="AttachVolume",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="AttachVolume"} 0.325255518
cloudprovider_aws_api_request_duration_seconds_count{request="AttachVolume"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.25"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="0.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeInstances",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="DescribeInstances"} 0.129124013
cloudprovider_aws_api_request_duration_seconds_count{request="DescribeInstances"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.005"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.01"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.025"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.05"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.1"} 0
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.25"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="0.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="1"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="2.5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="5"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="10"} 1
cloudprovider_aws_api_request_duration_seconds_bucket{request="DescribeVolumes",le="+Inf"} 1
cloudprovider_aws_api_request_duration_seconds_sum{request="DescribeVolumes"} 0.106219663
cloudprovider_aws_api_request_duration_seconds_count{request="DescribeVolumes"} 1
-                    100% |************************************************************************************************************************************************************************************************************************|  3972   0:00:00 ETA