Ignore Argo Rollouts with no template
cyxou opened this issue · comments
Describe the bug
When running python krr.py simple
against one particular namespace I get ERROR An unexpected error occurred
. Tried running against other namespaces and it works fine. But the one particular production namespace errors out. Here is the full log output in DEBUG mode:
❯ docker run -it --rm krr:latest krr.py simple -v
_____ _ _ _ _______ _____
| __ \ | | | | | |/ / __ \| __ \
| |__) |___ | |__ _ _ ___| |_ __ _ | ' /| |__) | |__) |
| _ // _ \| '_ \| | | / __| __/ _` | | < | _ /| _ /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_| \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_| \_\_| \_\
Running Robusta's KRR (Kubernetes Resource Recommender) 1.6.0-dev
Using strategy: Simple
Using formatter: table
[13:23:15] DEBUG Found 3 clusters: librepod, yc-cluster-staging, yc-cluster-production __init__.py:353
DEBUG Current cluster: yc-cluster-production __init__.py:354
DEBUG Configured clusters: [] __init__.py:356
INFO Using clusters: ['yc-cluster-production'] runner.py:190
INFO Listing scannable objects in yc-cluster-production __init__.py:62
DEBUG Namespaces: * __init__.py:63
DEBUG Resources: * __init__.py:64
DEBUG Listing Deployments in yc-cluster-production __init__.py:154
DEBUG Listing Rollouts in yc-cluster-production __init__.py:154
DEBUG Listing StatefulSets in yc-cluster-production __init__.py:154
DEBUG Listing DaemonSets in yc-cluster-production __init__.py:154
DEBUG Listing Jobs in yc-cluster-production __init__.py:154
DEBUG Found 0 Job in yc-cluster-production __init__.py:166
[13:23:16] ERROR An unexpected error occurred runner.py:247
Traceback (most recent call last):
File "/app/robusta_krr/core/runner.py", line 242, in run
result = await self._collect_result()
File "/app/robusta_krr/core/runner.py", line 193, in _collect_result
scans_tasks = [
File "/app/robusta_krr/core/runner.py", line 193, in <listcomp>
scans_tasks = [
File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 398, in list_scannable_objects
async for object in streamer:
File "/usr/local/lib/python3.9/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
result = task.result()
File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 79, in list_scannable_objects
async for object in streamer:
File "/usr/local/lib/python3.9/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
result = task.result()
File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 159, in _list_workflows
ret_multi = await loop.run_in_executor(
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 161, in <lambda>
lambda: all_namespaces_request(
File "/app/robusta_krr/core/integrations/kubernetes/rollout.py", line 42, in list_rollout_for_all_namespaces
return self.list_rollout_for_all_namespaces_with_http_info(**kwargs) # noqa: E501
File "/app/robusta_krr/core/integrations/kubernetes/rollout.py", line 152, in list_rollout_for_all_namespaces_with_http_info
return self.api_client.call_api(
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 192, in __call_api
return_data = self.deserialize(response_data, response_type)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 264, in deserialize
return self.__deserialize(data, response_type)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
kwargs[attr] = self.__deserialize(value, attr_type)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 280, in __deserialize
return [self.__deserialize(sub_data, sub_kls)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 280, in <listcomp>
return [self.__deserialize(sub_data, sub_kls)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
kwargs[attr] = self.__deserialize(value, attr_type)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
return self.__deserialize_model(data, klass)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 641, in __deserialize_model
instance = klass(**kwargs)
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/models/v1_deployment_spec.py", line 86, in __init__
self.template = template
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/models/v1_deployment_spec.py", line 266, in template
raise ValueError("Invalid value for `template`, must not be `None`") # noqa: E501
ValueError: Invalid value for `template`, must not be `None`
Found 0 Job in yc-cluster-production __init__.py:166
To Reproduce
I was using a slightly modified Dockerfile that comes with this repo:
# Use the official Python 3.9 slim image as the base image
FROM python:3.9 as builder
# Set the working directory
WORKDIR /app
# Install kubectl
RUN mkdir -p /etc/apt/keyrings \
&& apt-get update && apt-get -y install curl jq gawk \
&& curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg \
&& echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list \
&& apt-get update \
&& apt-get install --yes kubectl
# Install system dependencies required for Poetry
RUN apt-get update && \
dpkg --add-architecture arm64
COPY ./requirements.txt requirements.txt
# Install the project dependencies
RUN pip install -r requirements.txt
# Copy the rest of the application code
COPY . .
RUN mkdir -p /root/.kube
COPY kubeconfig-production /root/.kube/config
# Run the application using 'poetry run krr simple'
ENTRYPOINT ["python"]
- Build the image with
docker build -t krr .
- Run the kubectl within the built image to make sure that you can reach your cluster:
docker run -it --rm --entrypoint kubectl krr:latest get ns
- Finally run the krr within the built image:
docker run -it --rm krr:latest krr.py simple -v
Expected behavior
Should show recomendations for a particular namespace.
Are you interested in contributing a fix for this?
Maybe)
Once again, I do get recommendations from other namespaces, it just doesn't work for all namesapces and for the particular one. How can I debug the issue further? I guess something is wrong with my Prometheus metrics.
Hey @cyxou
Seems like while trying to list jobs, there's one job with no template.
Can you check if you have such job ?
If so, can you explain what is it for? Can you remove it?
I do not have any jobs being run in cluster while executing krr. The only job I have defined in manifests is the Migrator job which runs during ArgoCD Sync hook prior updating the backend deployment. Here is the manifest of this job:
apiVersion: batch/v1
kind: Job
metadata:
annotations:
argocd.argoproj.io/hook: Sync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
argocd.argoproj.io/sync-wave: -1
name: core-migrator
spec:
template:
metadata:
name: core-migrator
labels:
app: core-migrator
spec:
containers:
- name: migrator
image: core-image
imagePullPolicy: IfNotPresent
command: ["bin/rails"]
args:
[
"db:migrate:with_lock",
"db:seed:core",
"db:seed:mvd",
"db:seed:workflow",
]
env:
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: core-secrets
key: rails_master_key
restartPolicy: OnFailure
imagePullSecrets:
- name: gitlabregistry
I didn't read it right.
The stack trace is from list_rollout_for_all_namespaces_with_http_info
,
It tries to read argo rollouts (/apis/argoproj.io/v1alpha1/rollouts
), and fails because one (or more) don't have template
.
Do you have argo rollouts without template
?
Do you know if template
is mandatory in argo rollout object?
he-he, that makes sense since I do have Argo Rollouts in my clusters and I am using the spec.workloadRef
property instead of spec.template.spec
.
spec.template
is not mandatory. WorkloadRef holds a references to a workload that provides Pod template (e.g. Deployment). If used, then do not use Rollout template property. And this is exactly what I did since I've already had my deployments spinning.
Can we somehow ignore Rollouts without spec.templates
from parsing recommendations?
@arnoldyahad @LeaveMyYard
Looks like on Deployments spec, template
is mandatory, but not on Argo Rollouts
The k8s python api throws a validation error on such rollouts.
Thoughts?
@cyxou As a temporary solution you can use -r
parameter to select what resources you want to scan (exclude rollouts for now).
Note: I have just tested that flag and found that it had a bug, so make sure to use the latest code version from main branch
(cc @arikalon1) as a proper solution we will need to create a proper model (currently V1DeploymentList is used) that will know that template is not mandatory
But we will need to investigate if there are any other differences in rollouts spec
Thanks, I can confirm now that it works well when I specify resources via the -r
flag.