grafana / k6-operator

An operator for running distributed k6 tests.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Helm deploy failed to pass in custom configuration

zzhao2010 opened this issue · comments

Brief summary

I was testing to output k6 test metrics generated from each k6 executor pods to Prometheus via remote write. The remote write flag was enabled on Prometheus correctly as I see metrics reporting to Prometheus correctly when I port-forworded the Prometheus pod and triggered a test from my local. However, when I triggered the same test via k6-operator, I see error message on k6 pod "Failed to send the time series data to the endpoint" error="HTTP POST request failed: Post "http://prometheus-kube-prometheus-prometheus:9090/api/v1/write\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

k6-operator version or image

ghcr.io/grafana/k6-operator:controller-v0.0.12

Helm chart version (if applicable)

k6-operator-3.4.0
kube-prometheus-stack-56.6.0

TestRun / PrivateLoadZone YAML

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: demo
spec:
  parallelism: 1
  cleanup: post
  arguments: -o experimental-prometheus-rw --tag testid=demo_test
  script:
    configMap:
      name: demo
      file: test.js
  runner:
    env:
      - name: K6_PROMETHEUS_RW_SERVER_URL
        value: "http://prometheus-kube-prometheus-prometheus:9090/api/v1/write"
      - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
        value: "true"

Other environment details (if applicable)

minikube version: v1.32.0
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3

Steps to reproduce the problem

1, Enable prometheus remote write with values.yaml below:

prometheus:
  enabled: true
  prometheusSpec:
    ## enable --web.enable-remote-write-receiver flag on prometheus-server
    enableRemoteWriteReceiver: true

    # EnableFeatures API enables access to Prometheus disabled features.
    # ref: https://prometheus.io/docs/prometheus/latest/disabled_features/
    enableFeatures:
      - native-histograms

2, Apply the TestRun to the k8s cluster with a env variable K6_PROMETHEUS_RW_SERVER_URL using the service name of the Prometheus pod:9090/api/v1/write

Expected behaviour

The k6 metrics reporting to prometheus.

Actual behaviour

demo-1-rdg95 time="2024-02-02T07:13:30Z" level=error msg="Failed to send the time series data to the endpoint" error="HTTP POST request failed: Post \"http://prometheus-kube-prometheus-prometheus:9090/api/v1/write\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" output="Prometheus remote write"                                                    │
│ demo-1-rdg95 time="2024-02-02T07:13:30Z" level=warning msg="Successful flushed time series to remote write endpoint but it took 5.002130252s while flush period is 5s. Some samples may be dropped." nts=15 output="Prometheus remote write"

I tested the connection to Prometheus from another pod in the same cluster by curl -v -X POST http://prometheus-kube-prometheus-prometheus:9090/api/v1/write and the connection worked.

prometheus-grafana-9c98f646b-7h2mg:/usr/share/grafana$ curl -v -X POST http://prometheus-kube-prometheus-prometheus:9090/api/v1/write
* Host prometheus-kube-prometheus-prometheus:9090 was resolved.
* IPv6: (none)
* IPv4: 10.104.33.165
*   Trying 10.104.33.165:9090...
* Connected to prometheus-kube-prometheus-prometheus (10.104.33.165) port 9090
> POST /api/v1/write HTTP/1.1
> Host: prometheus-kube-prometheus-prometheus:9090
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 02 Feb 2024 07:26:56 GMT
< Content-Length: 22
<
snappy: corrupt input

Hi @zzhao2010,

prometheus:
  enabled: true
  prometheusSpec:
   ...

Which chart is configured with these values? k6-operator's chart cannot be configured in this way. This looks like an issue with your setup rather than with k6-operator.

Given the error, one thing that is worth checking is the URL for Prometheus:

        value: "http://prometheus-kube-prometheus-prometheus:9090/api/v1/write"

This assumes that Prometheus is in default namespace: context timeout suggests that k6 runners cannot reach it with this URL - it may be incorrect or incomplete. I'd double-check that part. But overall, it seems like an issue with setup rather than a bug here.

@yorugac
Turned out the issue was with the url. Because the prometheus pod and k6 pod are hosted on different namespace, the endpoint needs to include the namespace like http://prometheus-kube-prometheus-prometheus..svc.cluster.local:9090/api/v1/write..

Another question regarding the prometheus.enabled value in the doc in k6-operator chart, what does it do? The description didn't explain it clear. Does it have to be enabled for metrics reporting to Prometheus correctly?

prometheus.enabled is for creating ServiceMonitor: that option is meant for users of Prometheus Operator.

Glad you resolved it. I'm closing this issue as it is not a bug of k6-operator. As future reference, it is recommended to raise inquiries regarding k6-operator in community forum.