[BUG] Observability Agent: model cannot determine what arguments to use for the tool call
sadieleob opened this issue Β· comments
π Prerequisites
- I have searched the existing issues to avoid creating a duplicate
- By submitting this issue, you agree to follow our Code of Conduct
- I am using the latest version of the software
- I have tried to clear cache/cookies or used incognito mode (if ui-related)
- I can consistently reproduce this issue
π― Affected Service(s)
App Service
π¦ Impact/Severity
Blocker
π Bug Description
The Observability Agent in kagent fails when making queries to analyze pod resource consumption. The agent encounters an OpenAI/LiteLLM error where tool calls are not being properly handled, resulting in failed queries and broken chat sessions.
how much memory the httpbin pod in the default namespace is consuming?
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":false,"kind":"status-update","status":{"state":"submitted","message":{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","kind":"message","messageId":"msg-1ca9a726-2e53-4d1f-8bda-867955b17bd9","parts":[{"kind":"text","text":"how much memory the httpbin pod in the default namespace is consuming?"}],"role":"user","taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"},"timestamp":"2025-08-22T05:10:19.859440+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":false,"kind":"status-update","metadata":{"adk_app_name":"kagent__NS__observability_agent","adk_session_id":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","adk_user_id":"admin@kagent.dev"},"status":{"state":"working","timestamp":"2025-08-22T05:10:19.868998+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":true,"kind":"status-update","status":{"state":"failed","message":{"kind":"message","messageId":"8517bc35-354d-4ef9-84f4-002d8acba573","parts":[{"kind":"text","text":"litellm.BadRequestError: OpenAIException - An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_YksEGtVWMvu6udpAyPNlC035, call_Dl90RWYGFAOK9PHyhUiThoU1"}],"role":"agent"},"timestamp":"2025-08-22T05:10:20.801804+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
π Steps To Reproduce
- Deploy kagent with observability-agent (version 0.6.3)
- Configure Grafana MCP server with custom Grafana URL and API key
- Ask the observability agent a question about pod resource consumption, such as:
"can you give me the pods consuming more cpu?"
"how much memory the httpbin pod in the default namespace is consuming?"
how much memory the httpbin pod in the default namespace is consuming?
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":false,"kind":"status-update","status":{"state":"submitted","message":{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","kind":"message","messageId":"msg-1ca9a726-2e53-4d1f-8bda-867955b17bd9","parts":[{"kind":"text","text":"how much memory the httpbin pod in the default namespace is consuming?"}],"role":"user","taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"},"timestamp":"2025-08-22T05:10:19.859440+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":false,"kind":"status-update","metadata":{"adk_app_name":"kagent__NS__observability_agent","adk_session_id":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","adk_user_id":"admin@kagent.dev"},"status":{"state":"working","timestamp":"2025-08-22T05:10:19.868998+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":true,"kind":"status-update","status":{"state":"failed","message":{"kind":"message","messageId":"8517bc35-354d-4ef9-84f4-002d8acba573","parts":[{"kind":"text","text":"litellm.BadRequestError: OpenAIException - An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_YksEGtVWMvu6udpAyPNlC035, call_Dl90RWYGFAOK9PHyhUiThoU1"}],"role":"agent"},"timestamp":"2025-08-22T05:10:20.801804+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
mcpserver:
apiVersion: kagent.dev/v1alpha1
kind: MCPServer
metadata:
annotations:
meta.helm.sh/release-name: kagent
meta.helm.sh/release-namespace: kagent
creationTimestamp: "2025-08-22T04:52:17Z"
generation: 2
labels:
app.kubernetes.io/instance: kagent
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: observability-agent
app.kubernetes.io/part-of: kagent
app.kubernetes.io/version: 0.6.3
helm.sh/chart: observability-agent-0.6.3
name: grafana
namespace: kagent
resourceVersion: "12582"
uid: 1f256cda-6b65-4f9b-b0af-ba905eeff1ec
spec:
deployment:
args:
- --transport
- stdio
cmd: /app/mcp-grafana
env:
GRAFANA_URL: kube-prometheus-stack-grafana.monitoring.svc.cluster.local:3000/api
image: mcp/grafana:latest
port: 3000
secretRefs:
- name: grafana-api-key
transportType: stdio
status:
conditions:
- lastTransitionTime: "2025-08-22T04:52:18Z"
message: MCPServer configuration is valid
observedGeneration: 2
reason: Accepted
status: "True"
type: Accepted
- lastTransitionTime: "2025-08-22T04:52:18Z"
message: All references resolved successfully
observedGeneration: 2
reason: ResolvedRefs
status: "True"
type: ResolvedRefs
- lastTransitionTime: "2025-08-22T04:52:18Z"
message: All resources created successfully
observedGeneration: 2
reason: Programmed
status: "True"
type: Programmed
- lastTransitionTime: "2025-08-22T04:58:40Z"
message: Deployment is ready and all pods are running
observedGeneration: 2
reason: Ready
status: "True"
type: Ready
observedGeneration: 2
grafana/kagent pod logs:
β mcp-server 2025-08-22T05:11:26.026076Z info request gateway=bind/3000 listener=default route=mcp src.addr=10.244.1.70:44518 http.method=DELETE http.host=grafana.kagent htt β
β p.path=/mcp http.version=HTTP/1.1 http.status=202 duration=0ms β
β mcp-server time=2025-08-22T05:12:19.276Z level=INFO msg="Starting Grafana MCP server using stdio transport" version=(devel) β
β mcp-server time=2025-08-22T05:12:19.276Z level=INFO msg="Using Grafana configuration" url=http://localhost:3000/ api_key_set=true β
β mcp-server 2025-08-22T05:12:19.276944Z info request gateway=bind/3000 listener=default route=mcp src.addr=10.244.1.56:37738 http.method=POST http.host=grafana.kagent http. β
β path=/mcp http.version=HTTP/1.1 http.status=200 duration=9ms β
β mcp-server 2025-08-22T05:12:19.277366Z info request gateway=bind/3000 listener=default route=mcp src.addr=10.244.1.56:37738 http.method=POST http.host=grafana.kagent http. β
β path=/mcp http.version=HTTP/1.1 http.status=202 duration=0ms β
β mcp-server 2025-08-22T05:12:19.280237Z info request gateway=bind/3000 listener=default route=mcp src.addr=10.244.1.56:37738 http.method=POST http.host=grafana.kagent http. β
β path=/mcp http.version=HTTP/1.1 http.status=200 duration=2ms β
β mcp-server 2025-08-22T05:12:19.281060Z info request gateway=bind/3000 listener=default route=mcp src.addr=10.244.1.56:37738 http.method=DELETE http.host=grafana.kagent htt β
β p.path=/mcp http.version=HTTP/1.1 http.status=202 duration=0ms
π€ Expected Behavior
The observability agent should successfully query Grafana for pod resource metrics and return the requested information about CPU/memory consumption.
π± Actual Behavior
The observability agent fails with the following error:
how much memory the httpbin pod in the default namespace is consuming?
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":false,"kind":"status-update","status":{"state":"submitted","message":{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","kind":"message","messageId":"msg-1ca9a726-2e53-4d1f-8bda-867955b17bd9","parts":[{"kind":"text","text":"how much memory the httpbin pod in the default namespace is consuming?"}],"role":"user","taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"},"timestamp":"2025-08-22T05:10:19.859440+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":false,"kind":"status-update","metadata":{"adk_app_name":"kagent__NS__observability_agent","adk_session_id":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","adk_user_id":"admin@kagent.dev"},"status":{"state":"working","timestamp":"2025-08-22T05:10:19.868998+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
{"contextId":"ctx-4fd05d66-2589-45fa-9090-6e7f36df4613","final":true,"kind":"status-update","status":{"state":"failed","message":{"kind":"message","messageId":"8517bc35-354d-4ef9-84f4-002d8acba573","parts":[{"kind":"text","text":"litellm.BadRequestError: OpenAIException - An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_YksEGtVWMvu6udpAyPNlC035, call_Dl90RWYGFAOK9PHyhUiThoU1"}],"role":"agent"},"timestamp":"2025-08-22T05:10:20.801804+00:00"},"taskId":"05a92297-d14b-4d4f-8ca7-1e21490ff740"}
π» Environment
- kagent version: 0.6.3
- Kind K8s 1.29.12
- Grafana and Prometheus https://docs.solo.io/gateway/latest/observability/metrics/
π§ CLI Bug Report
kagent-bug-report-20250822-001548.tar.gz
π Additional Context
No response
π Logs
π· Screenshots
No response
π Are you willing to contribute?
- I am willing to submit a PR to fix this issue
Hello - I was able to reproduce this consistently and the issue seems to be this log line:
mcp-server time=2025-08-22T05:12:19.276Z level=INFO msg="Using Grafana configuration" url=http://localhost:3000/ api_key_set=true
When the observability-agent was created, the GRAFANA_URL was missing, so the MCP server defaulted to http://localhost:3000, causing connection timeouts.
This was the case even when my MCPServer was correctly configured:
apiVersion: kagent.dev/v1alpha1
kind: MCPServer
metadata:
annotations:
meta.helm.sh/release-name: kagent
meta.helm.sh/release-namespace: kagent
creationTimestamp: "2025-08-26T11:31:33Z"
generation: 4
labels:
app.kubernetes.io/instance: kagent
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: observability-agent
app.kubernetes.io/part-of: kagent
app.kubernetes.io/version: v0.6.3-3-g986a84e
helm.sh/chart: observability-agent-v0.6.3-3-g986a84e
name: grafana
namespace: kagent
resourceVersion: "53598"
uid: 17881eb0-a59e-4ca0-be7a-631d70ae9914
spec:
deployment:
args:
- --transport
- stdio
cmd: /app/mcp-grafana
env:
GRAFANA_URL: http://grafana.grafana:3000
image: mcp/grafana:latest
port: 3000
secretRefs:
- name: grafana-api-key
transportType: stdio
The only way I was able to fix it was by running:
# Add the missing env array
kubectl patch deployment grafana -n kagent --type='json' -p='[
{
"op": "add",
"path": "/spec/template/spec/containers/0/env",
"value": [{"name": "GRAFANA_URL", "value": "<My grafana instance URL was http://grafana.grafana>"}]
}
]'
It seems like kmcp MCPServer controller failed to apply the env section to the container?
This was fixed in: https://github.com/kagent-dev/kmcp/pull/56/files
I'm also hitting this issue as well, related to tool_calling when using open AI as the LLM
I believe I have fixed this issue in #872, which was released in 0.6.10