kubewharf / katalyst-core

Katalyst aims to provide a universal solution to help improve resource utilization and optimize the overall costs in the cloud. This is the core components in Katalyst system, including multiple agents and centralized components

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

节点的动态超分比例在增加CPU消耗后,不降反升

flpanbin opened this issue · comments

What happened?

我按照 动态超分的文档体验了下动态超分功能,但是在创建 testpod1 增加 cpu的消耗后,cpu的超分比 cpu_overcommit_ratio 不降反升。

没有pod运行时,查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.74
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.15
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135351666
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>

创建 testpod1 后,再次查看 g-master2 的kcnr:

[root@g-master1 katalyst]# kubectl describe kcnr g-master2
Name:         g-master2
Namespace:
Labels:       <none>
Annotations:  katalyst.kubewharf.io/cpu_overcommit_ratio: 1.99
              katalyst.kubewharf.io/guaranteed_cpus: 0
              katalyst.kubewharf.io/memory_overcommit_ratio: 1.41
              katalyst.kubewharf.io/overcommit_cpu_manager: none
              katalyst.kubewharf.io/overcommit_memory_manager: None
API Version:  node.katalyst.kubewharf.io/v1alpha1
Kind:         CustomNodeResource
Metadata:
  Creation Timestamp:  2024-05-27T14:02:23Z
  Generation:          1
  Resource Version:    135554723
  UID:                 78bc346b-d009-4ea8-bac1-51e2e6612d07
Spec:
  Node Resource Properties:
    Property Name:      numa
    Property Quantity:  2
    Property Name:      nbw
    Property Quantity:  10k
    Property Name:      cpu
    Property Quantity:  16
    Property Name:      memory
    Property Quantity:  32778468Ki
    Property Name:      cis
    Property Values:
      avx2
    Property Name:  topology
    Property Values:
      {"Iface":"ens192","Speed":10000,"NumaNode":0,"Enable":true,"Addr":{"IPV4":["10.6.202.112"],"IPV6":null},"NSName":"","NSAbsolutePath":""}
Events:  <none>


[root@g-master1 katalyst]# kubectl get pod -n katalyst-system
NAME                                            READY   STATUS    RESTARTS       AGE
katalyst-controller-747545d674-54d2j            1/1     Running   9 (14h ago)    6d19h
katalyst-webhook-69bdb7d5d6-jnrh5               1/1     Running   0              6d19h
overcommit-katalyst-agent-l2rdx                 1/1     Running   0              6d19h
overcommit-katalyst-agent-sb2bd                 1/1     Running   0              6d19h
overcommit-katalyst-agent-vb5wc                 1/1     Running   0              6d19h
overcommit-katalyst-scheduler-58f64f644-442lb   1/1     Running   16 (14h ago)   6d19h
testpod1                                        1/1     Running   0              12s

katalyst 版本:

panbin@panbindeMacBook-Pro ~ % helm list -n katalyst-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /Users/panbin/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /Users/panbin/.kube/config
NAME      	NAMESPACE      	REVISION	UPDATED                             	STATUS  	CHART                    	APP VERSION
overcommit	katalyst-system	1       	2024-05-27 22:01:28.110633 +0800 CST	deployed	katalyst-overcommit-0.5.0	v0.5.0

What did you expect to happen?

创建 testpod1 后, 对应节点的 cpu 超分比 katalyst.kubewharf.io/cpu_overcommit_ratio 降低。

How can we reproduce it (as minimally and precisely as possible)?

按照这个文档操作即可:https://gokatalyst.io/docs/user-guide/resource-overcommitment/dynamic-overcommitment/

Software version

$ <software> version
# paste output here

@WangZzzhe 帮忙看看

@flpanbin 可以提供下节点的相关信息吗?
1、创建测试pod前节点的request总量和负载;
2、测试pod的request和负载

@flpanbin 可以提供下节点的相关信息吗?
1、创建测试pod前节点的request总量和负载;
2、测试pod的request和负载

创建 pod 前节点的资源信息:

apiVersion: v1
kind: Node
metadata:
  annotations:
    katalyst.kubewharf.io/cpu_overcommit_ratio: "2.5"
    katalyst.kubewharf.io/memory_overcommit_ratio: "2.5"
    katalyst.kubewharf.io/original_allocatable_cpu: "16"
    katalyst.kubewharf.io/original_allocatable_memory: 32676068Ki
    katalyst.kubewharf.io/original_capacity_cpu: "16"
    katalyst.kubewharf.io/original_capacity_memory: 32778468Ki
    katalyst.kubewharf.io/overcommit_allocatable_cpu: 27840m
    katalyst.kubewharf.io/overcommit_allocatable_memory: 38479337676800m
    katalyst.kubewharf.io/overcommit_capacity_cpu: 27840m
    katalyst.kubewharf.io/overcommit_capacity_memory: 38599923916800m
    katalyst.kubewharf.io/realtime_cpu_overcommit_ratio: "1.74"
    katalyst.kubewharf.io/realtime_memory_overcommit_ratio: "1.15"
    ...
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    katalyst.kubewharf.io/overcommit_node_pool: overcommit-demo
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: g-master2
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
    ......
  name: g-master2
status:
  addresses:
  - address: g-master2
    type: Hostname
  allocatable:
    cpu: 27840m
    ephemeral-storage: "136351265362"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 38479337676800m
    pods: "180"
  capacity:
    cpu: 27840m
    ephemeral-storage: 144483Mi
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 38599923916800m
    pods: "180"

testpod1.yaml :

apiVersion: v1
kind: Pod
metadata:
  name: testpod1
  namespace: katalyst-system
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - g-master2
  containers:
  - name: testcontainer1
    image: polinux/stress:latest
    command: ["stress"]
    args: ["--cpu", "4", "--timeout", "6000"]
    resources:
      limits:
        cpu: 8
        memory: 8Gi
      requests:
        cpu: 4
        memory: 8Gi
  tolerations:
  - effect: NoSchedule
    key: test
    value: test
    operator: Equal

@flpanbin
对于内存来说是合理的,因为内存的申请量增加了但是负载没有变化。
理论上CPU在pod创建成功,但stress负载还没起来的情况下是可能上升的,但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。
https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154
https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158

@flpanbin
对于内存来说是合理的,因为内存的申请量增加了但是负载没有变化。
理论上CPU在pod创建成功,但stress负载还没起来的情况下是可能上升的,但稳定后相比之前应该是下降的。可以调整日志等级为6后观察下采集的数据是否准确。
https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L154
https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L158

感谢您的及时回复,我再观察下日志,不过针对您的回答有几个疑问:

  1. 为什么对于内存说是合理的呢?
  2. 为什么 stress 负载还没起来的情况下是可能上升的?
  3. 请问动态超分的算法是什么?

@flpanbin 在负载不变的情况下,资源申请量增加,节点可分配资源减少,导致节点需要超分更多的资源来达到目标负载值。
具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286

@flpanbin 在负载不变的情况下,资源申请量增加,节点可分配资源减少,导致节点需要超分更多的资源来达到目标负载值。
具体的规则可以参考https://github.com/kubewharf/katalyst-core/blob/main/pkg/agent/sysadvisor/plugin/overcommitmentaware/realtime/realtime.go#L286

感谢大佬,我研究研究。

@WangZzzhe
定位了下,应该是指标采集的问题,计算超分比时参与计算的 usage 是0。

I0609 01:45:03.275865       1 realtime.go:335] resource cpu request: 11964, allocatable: 16000, usage: 0, targetLoad: 0.6, existLoad: 0.4, overcommitRatio: 2.24775

overcommit-katalyst-agent 日志:

I0609 03:01:06.734172       1 provisioner.go:84] [malachite] heartbeat
E0609 03:01:06.738246       1 provisioner.go:111] [malachite] malachite is unhealthy: invalid http response status code 500, url: http://localhost:9002/api/v1/system/compute
I0609 03:01:06.738555       1 round_trippers.go:553] GET https://10.6.202.113:10250/stats/summary?timeout=10s 403 Forbidden in 3 milliseconds
E0609 03:01:06.739508       1 provisioner.go:65] failed to update stats/summary from kubelet: "failed to get kubelet config for summary api, error: Forbidden (user=system:serviceaccount:katalyst-system:katalyst-agent, verb=get, resource=nodes, subresource=stats)"
I0609 03:01:08.043645       1 realtime.go:155] [overcommitment-aware-realtime] sumUpPodsResources, cpu: 1845m, memory: 3715141632
E0609 03:01:08.043814       1 store_util.go:98] failed to get metric pod prometheus-insight-agent-kube-prometh-prometheus-0, container prometheus, metric cpu.usage.container, err: [MetricStore] empty map
E0609 03:01:08.044067       1 store_util.go:98] failed to get metric pod prometheus-insight-agent-kube-prometh-prometheus-0, container config-reloader, metric cpu.usage.container, err: [MetricStore] empty map

malachite 日志报错,应该是没有正常工作:

panbin@panbindeMacBook-Pro ~ % kubectl logs  malachite-xk8n9 -n malachite-system -f
2024-06-09T02:03:07.481004862+00:00 - [ERROR] server/src/main.rs:187 [Panic] lib/src/cpu/processor.rs:464: called `Result::unwrap()` on an `Err` value: ParseIntError { kind: Empty }
2024-06-09T02:03:07.489192152+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:11.271581881+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:11.271754576+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:16.338537826+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:16.338612068+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:21.407855335+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:21.408025943+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:26.450034224+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:26.451268751+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:31.459491370+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:31.459570543+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:36.486691177+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:36.486756735+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:41.575957128+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:41.589261474+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:46.624823586+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:46.624905589+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:51.695793619+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:51.695892044+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:03:56.827341960+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:03:56.827457338+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:04:01.853256781+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned
2024-06-09T02:04:01.853372828+00:00 - [ERROR] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/rocket-0.5.0-rc.2/src/server.rs:56 Handler compute panicked.
2024-06-09T02:04:06.899599297+00:00 - [ERROR] server/src/main.rs:187 [Panic] /root/.cargo/registry/src/rsproxy.cn-8f6827c7555bfaf8/once_cell-1.17.0/src/lib.rs:1276: Lazy instance has previously been poisoned

可能是和 linux 版本有关,环境信息:

[root@g-master1 ~]# uname -a
Linux g-master1 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@g-master1 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

k8s 和 containerd 版本:

[root@g-master1 ~]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.5", GitCommit:"93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c", GitTreeState:"clean", BuildDate:"2023-08-24T00:48:26Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.5", GitCommit:"93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c", GitTreeState:"clean", BuildDate:"2023-08-24T00:42:11Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
[root@g-master1 ~]# containerd -v
containerd github.com/containerd/containerd v1.7.6 091922f03c2762540fd057fba91260237ff86acb

我另外搭建了一个环境,使用 kubewharf enhanced kubernetes, 动态超分功能验证正常,看样子是对 Linux 内核版本和 containerd 的环境有要求?
环境信息如下:

root@ubuntu:~/katalyst# uname -a
Linux ubuntu 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@ubuntu:~/katalyst# kubectl get nodes
NAME           STATUS   ROLES           AGE   VERSION
10.6.202.170   Ready    control-plane   26m   v1.24.6-kubewharf.8

root@ubuntu:~/katalyst# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:56:31Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.6-kubewharf.8", GitCommit:"443c2773bbac8eeb5648f22f2b262d05e985595c", GitTreeState:"clean", BuildDate:"2024-01-04T03:51:02Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}

root@ubuntu:~/katalyst# containerd -v
containerd github.com/containerd/containerd v1.4.12 7b11cfaabd73bb80907dd23182b9347b4245eb5d

@flpanbin malachite 依赖 ebpf,所以 3.10 的内核应该不太行。4.19+ 应该可以