[Feature] Expose /heatlhz check for cost-analyzer-frontend, if OIDC is enabled.
williamkubecost opened this issue · comments
Problem Statement
Hey Everyone,
This issue is being reported on behalf of our customer, who utilizes network endpoint groups (NEGs) to distribute traffic to the load balancer's backends. In certain scenarios, when OIDC is enabled, it has been identified that regular health checks appear to go unauthenticated. This situation is causing disruptions for users attempting to connect to the Kubecost UI.
Here's a snapshot of the relevant logs:
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:41:11 +0000] "GET /healthz HTTP/1.1" 499 0 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:41:11 +0000] "GET /healthz HTTP/1.1" 499 0 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend 2024/03/12 18:41:11 [error] 50#50: *1001 connect() failed (111: Connection refused) while connecting to upstream, client: [REDACTED], server: _, request: "GET /healthz HTTP/1.1", subrequest: "/auth", upstream: │
│ "http://[REDACTED]:9003/isAuthenticated", host: "[REDACTED]"
cost-analyzer-frontend 2024/03/12 18:41:11 [error] 50#50: *1001 auth request unexpected status: 502 while sending to client, client: [REDACTED], server: _, request: "GET /healthz HTTP/1.1", host: "[REDACTED]"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:41:11 +0000] "GET /healthz HTTP/1.1" 500 177 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:41:12 +0000] "GET /healthz HTTP/1.1" 499 0 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:41:12 +0000] "GET /healthz HTTP/1.1" 499 0 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend 2024/03/12 18:41:12 [error] 50#50: *1005 connect() failed (111: Connection refused) while connecting to upstream, client: [REDACTED], server: _, request: "GET /healthz HTTP/1.1", subrequest: "/auth", upstream: │
│ "http://[REDACTED]:9003/isAuthenticated", host: "[REDACTED]"
To attach NEGs to Kubecost pods, the pods' health must be confirmed through health check, with the status being monitored by the load balancer.
When OIDC is enabled, and there is an exposed health check these checks began to succeed, which allows the users to authenticate:
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:48:50 +0000] "GET /healthz HTTP/1.1" 200 8 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:48:50 +0000] "GET /healthz HTTP/1.1" 200 8 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:48:51 +0000] "GET /healthz HTTP/1.1" 200 8 "-" "GoogleHC/1.0" "-"
cost-analyzer-frontend [REDACTED] - - [12/Mar/2024:18:48:51 +0000] "GET /healthz HTTP/1.1" 200 8 "-" "GoogleHC/1.0" "-"
cost-model 2024-03-12T18:48:51.536060561Z INF Attempting to authenticate OIDC...
cost-model 2024-03-12T18:48:51.536143642Z INF Validating OIDC token...
cost-model 2024-03-12T18:48:51.536183565Z INF Attempting to fetch token from cookie...
cost-model 2024-03-12T18:48:51.576197251Z INF 200 for request GET https://openidconnect.googleapis.com/v1/userinfo HTTP/1.1 in cookie or
Solution Description
Adding something like this when OIDC is enabled to the cost-analyzer-helm-chart/cost-analyzer/templates/cost-analyzer-frontend-config-map-template.yaml
, seems to be a plausible solution.
location /healthz {
add_header 'Content-Type' 'text/plain';
return 200 "healthy\n";
}
Alternatives
Manually working around this by editing the file, or perhaps this fits better under the extraServerConfig for nginx?
Additional Context
No response
Troubleshooting
- I have read and followed the issue guidelines and this is a feature request only for the Helm chart.
- I have searched other issues in this repository and mine is not recorded.
I think this is a relatively easy PR, but just wanted to get some additional feedback before doing so.
Thanks for the assist @rossfisherkc !!