Helm deploy failed to pass in custom configuration
zzhao2010 opened this issue · comments
Brief summary
I was testing to output k6 test metrics generated from each k6 executor pods to Prometheus via remote write. The remote write flag was enabled on Prometheus correctly as I see metrics reporting to Prometheus correctly when I port-forworded the Prometheus pod and triggered a test from my local. However, when I triggered the same test via k6-operator, I see error message on k6 pod "Failed to send the time series data to the endpoint" error="HTTP POST request failed: Post "http://prometheus-kube-prometheus-prometheus:9090/api/v1/write\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
k6-operator version or image
ghcr.io/grafana/k6-operator:controller-v0.0.12
Helm chart version (if applicable)
k6-operator-3.4.0
kube-prometheus-stack-56.6.0
TestRun / PrivateLoadZone YAML
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: demo
spec:
parallelism: 1
cleanup: post
arguments: -o experimental-prometheus-rw --tag testid=demo_test
script:
configMap:
name: demo
file: test.js
runner:
env:
- name: K6_PROMETHEUS_RW_SERVER_URL
value: "http://prometheus-kube-prometheus-prometheus:9090/api/v1/write"
- name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
value: "true"
Other environment details (if applicable)
minikube version: v1.32.0
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3
Steps to reproduce the problem
1, Enable prometheus remote write with values.yaml below:
prometheus:
enabled: true
prometheusSpec:
## enable --web.enable-remote-write-receiver flag on prometheus-server
enableRemoteWriteReceiver: true
# EnableFeatures API enables access to Prometheus disabled features.
# ref: https://prometheus.io/docs/prometheus/latest/disabled_features/
enableFeatures:
- native-histograms
2, Apply the TestRun to the k8s cluster with a env variable K6_PROMETHEUS_RW_SERVER_URL using the service name of the Prometheus pod:9090/api/v1/write
Expected behaviour
The k6 metrics reporting to prometheus.
Actual behaviour
demo-1-rdg95 time="2024-02-02T07:13:30Z" level=error msg="Failed to send the time series data to the endpoint" error="HTTP POST request failed: Post \"http://prometheus-kube-prometheus-prometheus:9090/api/v1/write\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" output="Prometheus remote write" │
│ demo-1-rdg95 time="2024-02-02T07:13:30Z" level=warning msg="Successful flushed time series to remote write endpoint but it took 5.002130252s while flush period is 5s. Some samples may be dropped." nts=15 output="Prometheus remote write"
I tested the connection to Prometheus from another pod in the same cluster by curl -v -X POST http://prometheus-kube-prometheus-prometheus:9090/api/v1/write
and the connection worked.
prometheus-grafana-9c98f646b-7h2mg:/usr/share/grafana$ curl -v -X POST http://prometheus-kube-prometheus-prometheus:9090/api/v1/write
* Host prometheus-kube-prometheus-prometheus:9090 was resolved.
* IPv6: (none)
* IPv4: 10.104.33.165
* Trying 10.104.33.165:9090...
* Connected to prometheus-kube-prometheus-prometheus (10.104.33.165) port 9090
> POST /api/v1/write HTTP/1.1
> Host: prometheus-kube-prometheus-prometheus:9090
> User-Agent: curl/8.5.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Fri, 02 Feb 2024 07:26:56 GMT
< Content-Length: 22
<
snappy: corrupt input
Hi @zzhao2010,
prometheus:
enabled: true
prometheusSpec:
...
Which chart is configured with these values? k6-operator's chart cannot be configured in this way. This looks like an issue with your setup rather than with k6-operator.
Given the error, one thing that is worth checking is the URL for Prometheus:
value: "http://prometheus-kube-prometheus-prometheus:9090/api/v1/write"
This assumes that Prometheus is in default namespace: context timeout suggests that k6 runners cannot reach it with this URL - it may be incorrect or incomplete. I'd double-check that part. But overall, it seems like an issue with setup rather than a bug here.
@yorugac
Turned out the issue was with the url. Because the prometheus pod and k6 pod are hosted on different namespace, the endpoint needs to include the namespace like http://prometheus-kube-prometheus-prometheus..svc.cluster.local:9090/api/v1/write..
Another question regarding the prometheus.enabled
value in the doc in k6-operator chart, what does it do? The description didn't explain it clear. Does it have to be enabled for metrics reporting to Prometheus correctly?
prometheus.enabled
is for creating ServiceMonitor
: that option is meant for users of Prometheus Operator.
Glad you resolved it. I'm closing this issue as it is not a bug of k6-operator. As future reference, it is recommended to raise inquiries regarding k6-operator in community forum.