prometheus-operator / kube-prometheus

What happened?
when i install kube-prometheus in my k8s cluster , i get a bug , grafana always unhealthly
And i can not access grafana page , always got http 503 error, more detail as follow

when i run kubectl get pod -n monitoring , the pod status is runinng

but when i describe pod , i get error, run command kubectl describe pod grafana-69f6b485b9-xmhzr -n monitoring

if i go inside pod , run command curl 127.0.0.1:3000 it's ok i got grafana login page

and i run command curl 172.20.75.53:3000 that's my endpoints ip , it's still ok

if i run command curl 10.68.160.10:3000 this ip is my grafana service ip, i got same problem ,time out

How to reproduce it (as minimally and precisely as possible):

just install kube-prometheus in k8s version 1.28.4
i had reinstall kube-prometheus v0.13.0 many times , and i try install main branch , they are same problem

Environment

Prometheus Operator version:

    kube-prometheus v0.13.0 
    and main branch codes , commit id is 035b09f42441d4630b3a3de4e4a490d19b1ba5e4

<!-- Try kubectl -n monitoring describe deployment prometheus-operator -->
 grafana/grafana:10.2.2
 prometheus-operator:v0.70.0
 kube-rbac-proxy:v0.15.0

Kubernetes version information:

kubectl version

Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.4

Kubernetes cluster kind:

insert how you created your cluster: kops, bootkube, tectonic-installer, etc.

 i using kubeasz install my k8s cluster
but before i using kubeasz install old version k8s, kube-prometheus is can be working

Manifests:

insert manifests relevant to the issue

Prometheus Operator Logs:

level=info ts=2023-12-02T11:37:53.461319592Z caller=main.go:180 msg="Starting Prometheus Operator" version="(version=0.70.0, branch=refs/tags/v0.70.0, revision=c2c673f7123f3745a2a982b4a2bdc43a11f50fad)"
level=info ts=2023-12-02T11:37:53.461355426Z caller=main.go:181 build_context="(go=go1.21.4, platform=linux/amd64, user=Action-Run-ID-7048794395, date=20231130-15:41:45, tags=unknown)"
level=info ts=2023-12-02T11:37:53.461366316Z caller=main.go:192 msg="namespaces filtering configuration " config="{allow_list=\"\",deny_list=\"\",prometheus_allow_list=\"\",alertmanager_allow_list=\"\",alertmanagerconfig_allow_list=\"\",thanosruler_allow_list=\"\"}"
level=info ts=2023-12-02T11:37:53.558524999Z caller=main.go:221 msg="connection established" cluster-version=v1.28.4
level=info ts=2023-12-02T11:37:53.566582415Z caller=operator.go:321 component=prometheusoperator msg="Kubernetes API capabilities" endpointslices=true
level=info ts=2023-12-02T11:37:53.655058666Z caller=operator.go:308 component=prometheusagentoperator msg="Kubernetes API capabilities" endpointslices=true
level=info ts=2023-12-02T11:37:53.659676851Z caller=server.go:229 msg="starting insecure server" address=[::]:8080
level=info ts=2023-12-02T11:37:54.154016263Z caller=operator.go:309 component=alertmanageroperator msg="successfully synced all caches"
level=info ts=2023-12-02T11:37:54.154681707Z caller=operator.go:760 component=alertmanageroperator msg="StatefulSet not found" key=monitoring/alertmanager-main
level=info ts=2023-12-02T11:37:54.154745768Z caller=operator.go:641 component=alertmanageroperator key=monitoring/main msg="sync alertmanager"
level=info ts=2023-12-02T11:37:54.254816258Z caller=operator.go:417 component=prometheusagentoperator msg="successfully synced all caches"
level=info ts=2023-12-02T11:37:54.453908925Z caller=operator.go:270 component=thanosoperator msg="successfully synced all caches"
level=info ts=2023-12-02T11:37:54.459149196Z caller=operator.go:378 component=prometheusoperator msg="successfully synced all caches"
level=info ts=2023-12-02T11:37:54.45999917Z caller=operator.go:975 component=prometheusoperator key=monitoring/k8s msg="sync prometheus"
level=info ts=2023-12-02T11:37:54.760866675Z caller=operator.go:760 component=alertmanageroperator msg="StatefulSet not found" key=monitoring/alertmanager-main
level=warn ts=2023-12-02T11:37:54.860453741Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[1].ports[0]: duplicate port definition with spec.template.spec.initContainers[0].ports[0]"
level=info ts=2023-12-02T11:37:54.862375656Z caller=operator.go:760 component=alertmanageroperator msg="StatefulSet not found" key=monitoring/alertmanager-main
level=info ts=2023-12-02T11:37:54.95348229Z caller=operator.go:641 component=alertmanageroperator key=monitoring/main msg="sync alertmanager"
level=info ts=2023-12-02T11:37:55.554781574Z caller=operator.go:641 component=alertmanageroperator key=monitoring/main msg="sync alertmanager"
level=info ts=2023-12-02T11:37:55.671173521Z caller=operator.go:641 component=alertmanageroperator key=monitoring/main msg="sync alertmanager"
level=info ts=2023-12-02T11:37:56.462935137Z caller=operator.go:641 component=alertmanageroperator key=monitoring/main msg="sync alertmanager"
level=warn ts=2023-12-02T11:37:56.560303786Z caller=klog.go:96 component=k8s_client_runtime func=Warning msg="spec.template.spec.containers[1].ports[0]: duplicate port definition with spec.template.spec.initContainers[0].ports[0]"
level=info ts=2023-12-02T11:37:56.561130316Z caller=operator.go:975 component=prometheusoperator key=monitoring/k8s msg="sync prometheus"
level=info ts=2023-12-02T11:37:56.654685574Z caller=operator.go:641 component=alertmanageroperator key=monitoring/main msg="sync alertmanager"
level=info ts=2023-12-02T11:37:58.071097896Z caller=operator.go:975 component=prometheusoperator key=monitoring/k8s msg="sync prometheus"
level=info ts=2023-12-02T11:37:59.468807278Z caller=operator.go:975 component=prometheusoperator key=monitoring/k8s msg="sync prometheus"

Prometheus Logs:

ts=2023-12-02T11:38:17.021Z caller=main.go:583 level=info msg="Starting Prometheus Server" mode=server version="(version=2.48.0, branch=HEAD, revision=6d80b30990bc297d95b5c844e118c4011fad8054)"
ts=2023-12-02T11:38:17.021Z caller=main.go:588 level=info build_context="(go=go1.21.4, platform=linux/amd64, user=root@26117804242c, date=20231116-04:35:21, tags=netgo,builtinassets,stringlabels)"
ts=2023-12-02T11:38:17.021Z caller=main.go:589 level=info host_details="(Linux 6.2.0-1016-aws #16~22.04.1-Ubuntu SMP Sun Nov  5 20:08:16 UTC 2023 x86_64 prometheus-k8s-0 (none))"
ts=2023-12-02T11:38:17.021Z caller=main.go:590 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2023-12-02T11:38:17.021Z caller=main.go:591 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-12-02T11:38:17.022Z caller=web.go:566 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2023-12-02T11:38:17.023Z caller=main.go:1024 level=info msg="Starting TSDB ..."
ts=2023-12-02T11:38:17.024Z caller=tls_config.go:274 level=info component=web msg="Listening on" address=[::]:9090
ts=2023-12-02T11:38:17.024Z caller=tls_config.go:313 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2023-12-02T11:38:17.027Z caller=head.go:601 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-12-02T11:38:17.027Z caller=head.go:682 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=2.047µs
ts=2023-12-02T11:38:17.027Z caller=head.go:690 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2023-12-02T11:38:17.027Z caller=head.go:761 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2023-12-02T11:38:17.027Z caller=head.go:798 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=37.423µs wal_replay_duration=271.555µs wbl_replay_duration=187ns total_replay_duration=368.672µs
ts=2023-12-02T11:38:17.029Z caller=main.go:1045 level=info fs_type=EXT4_SUPER_MAGIC
ts=2023-12-02T11:38:17.029Z caller=main.go:1048 level=info msg="TSDB started"
ts=2023-12-02T11:38:17.029Z caller=main.go:1229 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
ts=2023-12-02T11:38:17.038Z caller=kubernetes.go:329 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kubelet/2 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:17.038Z caller=kubernetes.go:329 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/node-exporter/0 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:17.038Z caller=kubernetes.go:329 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kube-apiserver/0 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:17.039Z caller=kubernetes.go:329 level=info component="discovery manager notify" discovery=kubernetes config=config-0 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:17.079Z caller=main.go:1266 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=49.588811ms db_storage=1.062µs remote_storage=1.897µs web_handler=459ns query_engine=688ns scrape=413.214µs scrape_sd=1.009143ms notify=15.549µs notify_sd=174.236µs rules=39.836402ms tracing=6.187µs
ts=2023-12-02T11:38:17.079Z caller=main.go:1009 level=info msg="Server is ready to receive web requests."
ts=2023-12-02T11:38:17.079Z caller=manager.go:1012 level=info component="rule manager" msg="Starting rule manager..."
ts=2023-12-02T11:38:21.179Z caller=main.go:1229 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
ts=2023-12-02T11:38:21.187Z caller=kubernetes.go:329 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kube-state-metrics/1 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:21.187Z caller=kubernetes.go:329 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kubelet/0 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:21.187Z caller=kubernetes.go:329 level=info component="discovery manager scrape" discovery=kubernetes config=serviceMonitor/monitoring/kube-apiserver/0 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:21.188Z caller=kubernetes.go:329 level=info component="discovery manager notify" discovery=kubernetes config=config-0 msg="Using pod service account via in-cluster config"
ts=2023-12-02T11:38:21.238Z caller=main.go:1266 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=58.424305ms db_storage=1.148µs remote_storage=1.37µs web_handler=374ns query_engine=938ns scrape=58.77µs scrape_sd=836.048µs notify=15.88µs notify_sd=261.038µs rules=49.875014ms tracing=5.974µs

Anything else we need to know?:

i show my service and ingress
run command kubectl get svc -n monitoring

run command kubectl get ingress -n monitoring grafana-ingress -o yaml

I already installed the manifests files and was able to access Grafana using the port-forward as described in the docs.
Do you already tried this?
https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/access-ui.md#grafana

If you need to expose Grafana using a LoadBalancer service (as MetalLB), maybe you need to remove the readinessProbe section in grafana deployment(that's the only way that works for me):

kube-prometheus/manifests/grafana-deployment.yaml

Lines 38 to 41 in 035b09f

    
           readinessProbe: 
        
             httpGet: 
        
               path: /api/health 
        
               port: http

Environment:
k3s (v1.28.3)
kube-prometheus (release-0.13)

i delete grafana readinessProbe and promethus-k8s livenessProbe ,
and i run the command kubectl --namespace monitoring port-forward svc/grafana 3000
got an error page

i using ingress-nginx controller expose grafana, got an error page 504

as i say ,i can access login page inside container , curl http://127.0.0.1:3000 , but i can't access login page with grafana serivice, curl http://grafana:3000

i think this problem maybe about grafana and service of grafana

and i still inside this container, i can access another service, for example curl http://api-service.prod:3000 , but can not access grafana service name curl http://grafana:3000

can not run kube-prometheus in k8s 1.28.4