robusta-dev / krr

Prometheus-based Kubernetes Resource Recommendations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ignore Argo Rollouts with no template

cyxou opened this issue · comments

commented

Describe the bug
When running python krr.py simple against one particular namespace I get ERROR An unexpected error occurred. Tried running against other namespaces and it works fine. But the one particular production namespace errors out. Here is the full log output in DEBUG mode:

❯ docker run -it --rm krr:latest krr.py simple -v
                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                  
 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\
                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                                  
Running Robusta's KRR (Kubernetes Resource Recommender) 1.6.0-dev
Using strategy: Simple
Using formatter: table
                                                                                                                                                                                                                                                                                  
[13:23:15] DEBUG    Found 3 clusters: librepod, yc-cluster-staging, yc-cluster-production                                                                                                                                                                          __init__.py:353
           DEBUG    Current cluster: yc-cluster-production                                                                                                                                                                                                         __init__.py:354
           DEBUG    Configured clusters: []                                                                                                                                                                                                                        __init__.py:356
           INFO     Using clusters: ['yc-cluster-production']                                                                                                                                                                                                        runner.py:190
           INFO     Listing scannable objects in yc-cluster-production                                                                                                                                                                                              __init__.py:62
           DEBUG    Namespaces: *                                                                                                                                                                                                                                   __init__.py:63
           DEBUG    Resources: *                                                                                                                                                                                                                                    __init__.py:64
           DEBUG    Listing Deployments in yc-cluster-production                                                                                                                                                                                                   __init__.py:154
           DEBUG    Listing Rollouts in yc-cluster-production                                                                                                                                                                                                      __init__.py:154
           DEBUG    Listing StatefulSets in yc-cluster-production                                                                                                                                                                                                  __init__.py:154
           DEBUG    Listing DaemonSets in yc-cluster-production                                                                                                                                                                                                    __init__.py:154
           DEBUG    Listing Jobs in yc-cluster-production                                                                                                                                                                                                          __init__.py:154
           DEBUG    Found 0 Job in yc-cluster-production                                                                                                                                                                                                           __init__.py:166
[13:23:16] ERROR    An unexpected error occurred                                                                                                                                                                                                                     runner.py:247
                    Traceback (most recent call last):
                      File "/app/robusta_krr/core/runner.py", line 242, in run
                        result = await self._collect_result()
                      File "/app/robusta_krr/core/runner.py", line 193, in _collect_result
                        scans_tasks = [
                      File "/app/robusta_krr/core/runner.py", line 193, in <listcomp>
                        scans_tasks = [
                      File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 398, in list_scannable_objects
                        async for object in streamer:
                      File "/usr/local/lib/python3.9/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
                        result = task.result()
                      File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 79, in list_scannable_objects
                        async for object in streamer:
                      File "/usr/local/lib/python3.9/site-packages/aiostream/stream/advanced.py", line 59, in base_combine
                        result = task.result()
                      File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 159, in _list_workflows
                        ret_multi = await loop.run_in_executor(
                      File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
                        result = self.fn(*self.args, **self.kwargs)
                      File "/app/robusta_krr/core/integrations/kubernetes/__init__.py", line 161, in <lambda>
                        lambda: all_namespaces_request(
                      File "/app/robusta_krr/core/integrations/kubernetes/rollout.py", line 42, in list_rollout_for_all_namespaces
                        return self.list_rollout_for_all_namespaces_with_http_info(**kwargs)  # noqa: E501
                      File "/app/robusta_krr/core/integrations/kubernetes/rollout.py", line 152, in list_rollout_for_all_namespaces_with_http_info
                        return self.api_client.call_api(
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
                        return self.__call_api(resource_path, method,
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 192, in __call_api
                        return_data = self.deserialize(response_data, response_type)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 264, in deserialize
                        return self.__deserialize(data, response_type)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
                        return self.__deserialize_model(data, klass)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
                        kwargs[attr] = self.__deserialize(value, attr_type)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 280, in __deserialize
                        return [self.__deserialize(sub_data, sub_kls)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 280, in <listcomp>
                        return [self.__deserialize(sub_data, sub_kls)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
                        return self.__deserialize_model(data, klass)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 639, in __deserialize_model
                        kwargs[attr] = self.__deserialize(value, attr_type)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 303, in __deserialize
                        return self.__deserialize_model(data, klass)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 641, in __deserialize_model
                        instance = klass(**kwargs)
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/models/v1_deployment_spec.py", line 86, in __init__
                        self.template = template
                      File "/usr/local/lib/python3.9/site-packages/kubernetes/client/models/v1_deployment_spec.py", line 266, in template
                        raise ValueError("Invalid value for `template`, must not be `None`")  # noqa: E501
                    ValueError: Invalid value for `template`, must not be `None`
   Found 0 Job in yc-cluster-production                                                                                                                                                                                                           __init__.py:166

To Reproduce
I was using a slightly modified Dockerfile that comes with this repo:

# Use the official Python 3.9 slim image as the base image
FROM python:3.9 as builder

# Set the working directory
WORKDIR /app

  # Install kubectl
RUN mkdir -p /etc/apt/keyrings \
    && apt-get update && apt-get -y install curl jq gawk \
    && curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg \
    && echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | tee /etc/apt/sources.list.d/kubernetes.list \
    && apt-get update \
    && apt-get install --yes kubectl

# Install system dependencies required for Poetry
RUN apt-get update && \
    dpkg --add-architecture arm64

COPY ./requirements.txt requirements.txt

# Install the project dependencies
RUN pip install -r requirements.txt

# Copy the rest of the application code
COPY . .

RUN mkdir -p /root/.kube
COPY kubeconfig-production /root/.kube/config

# Run the application using 'poetry run krr simple'
ENTRYPOINT ["python"]
  1. Build the image with docker build -t krr .
  2. Run the kubectl within the built image to make sure that you can reach your cluster: docker run -it --rm --entrypoint kubectl krr:latest get ns
  3. Finally run the krr within the built image: docker run -it --rm krr:latest krr.py simple -v

Expected behavior
Should show recomendations for a particular namespace.

Screenshots
image

Are you interested in contributing a fix for this?
Maybe)

Once again, I do get recommendations from other namespaces, it just doesn't work for all namesapces and for the particular one. How can I debug the issue further? I guess something is wrong with my Prometheus metrics.

commented

Hey @cyxou

Seems like while trying to list jobs, there's one job with no template.
Can you check if you have such job ?
If so, can you explain what is it for? Can you remove it?

commented

I do not have any jobs being run in cluster while executing krr. The only job I have defined in manifests is the Migrator job which runs during ArgoCD Sync hook prior updating the backend deployment. Here is the manifest of this job:

apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    argocd.argoproj.io/hook: Sync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
    argocd.argoproj.io/sync-wave: -1
  name: core-migrator
spec:
  template:
    metadata:
      name: core-migrator
      labels:
        app: core-migrator
    spec:
      containers:
        - name: migrator
          image: core-image
          imagePullPolicy: IfNotPresent
          command: ["bin/rails"]
          args:
            [
              "db:migrate:with_lock",
              "db:seed:core",
              "db:seed:mvd",
              "db:seed:workflow",
            ]
          env:
            - name: RAILS_MASTER_KEY
              valueFrom:
                secretKeyRef:
                  name: core-secrets
                  key: rails_master_key
      restartPolicy: OnFailure
      imagePullSecrets:
        - name: gitlabregistry
commented

I didn't read it right.
The stack trace is from list_rollout_for_all_namespaces_with_http_info,
It tries to read argo rollouts (/apis/argoproj.io/v1alpha1/rollouts), and fails because one (or more) don't have template.

Do you have argo rollouts without template?
Do you know if template is mandatory in argo rollout object?

commented

he-he, that makes sense since I do have Argo Rollouts in my clusters and I am using the spec.workloadRef property instead of spec.template.spec.

spec.template is not mandatory. WorkloadRef holds a references to a workload that provides Pod template (e.g. Deployment). If used, then do not use Rollout template property. And this is exactly what I did since I've already had my deployments spinning.

Can we somehow ignore Rollouts without spec.templates from parsing recommendations?

commented

@arnoldyahad @LeaveMyYard
Looks like on Deployments spec, template is mandatory, but not on Argo Rollouts
The k8s python api throws a validation error on such rollouts.
Thoughts?

@cyxou As a temporary solution you can use -r parameter to select what resources you want to scan (exclude rollouts for now).

Note: I have just tested that flag and found that it had a bug, so make sure to use the latest code version from main branch

(cc @arikalon1) as a proper solution we will need to create a proper model (currently V1DeploymentList is used) that will know that template is not mandatory

But we will need to investigate if there are any other differences in rollouts spec

commented

Thanks, I can confirm now that it works well when I specify resources via the -r flag.