[install error] katalyst-agent CrashLoopBackOff
googs1025 opened this issue · comments
What happened?
root@VM-0-15-ubuntu:/home/ubuntu# kubectl get pods -nkatalyst-system
NAME READY STATUS RESTARTS AGE
katalyst-agent-4qx2t 0/1 CrashLoopBackOff 10 (31s ago) 26m
katalyst-agent-jdl97 0/1 CrashLoopBackOff 10 (22s ago) 26m
katalyst-agent-pwm7l 0/1 Error 10 (5m11s ago) 26m
katalyst-controller-845ccf946b-ftxgx 1/1 Running 0 26m
katalyst-controller-845ccf946b-lm9bm 1/1 Running 0 26m
katalyst-metric-765c44bbb5-48ws6 1/1 Running 0 26m
katalyst-scheduler-5746f9bd4c-swgc4 1/1 Running 0 26m
katalyst-scheduler-5746f9bd4c-x2vct 1/1 Running 0 26m
katalyst-webhook-68fcf99cd8-26c8g 1/1 Running 0 26m
katalyst-webhook-68fcf99cd8-7fs78 1/1 Running 0 26m
root@VM-0-15-ubuntu:/home/ubuntu# kubectl logs katalyst-agent-4qx2t -nkatalyst-system
W0502 08:03:20.626350 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024/05/02 08:03:20 <nil>
I0502 08:03:20.626831 1 otel_prom_metrics_mux.go:94] [katalyst-core/pkg/metrics/metrics-pool.(*openTelemetryPrometheusMetricsEmitterPool).GetMetricsEmitter] add path /metrics to metric emitter
W0502 08:03:20.636464 1 info.go:53] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
I0502 08:03:20.636778 1 network_linux.go:80] [katalyst-core/pkg/util/machine.GetExtraNetworkInfo] namespace list: []
W0502 08:03:20.637199 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: eth0 with devPath: /sys/devices/virtual/net/eth0 which isn't pci device
W0502 08:03:20.637248 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: kube-ipvs0 with devPath: /sys/devices/virtual/net/kube-ipvs0 which isn't pci device
W0502 08:03:20.637281 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: lo with devPath: /sys/devices/virtual/net/lo which isn't pci device
W0502 08:03:20.637311 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: veth064d18ee with devPath: /sys/devices/virtual/net/veth064d18ee which isn't pci device
W0502 08:03:20.637339 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: veth06d57915 with devPath: /sys/devices/virtual/net/veth06d57915 which isn't pci device
W0502 08:03:20.637365 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: veth5290716c with devPath: /sys/devices/virtual/net/veth5290716c which isn't pci device
W0502 08:03:20.637396 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: veth6f37d282 with devPath: /sys/devices/virtual/net/veth6f37d282 which isn't pci device
W0502 08:03:20.637428 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: veth87922afb with devPath: /sys/devices/virtual/net/veth87922afb which isn't pci device
W0502 08:03:20.637457 1 network_linux.go:178] [katalyst-core/pkg/util/machine.getNSNetworkHardwareTopology] skip nic: veth8dccdf2e with devPath: /sys/devices/virtual/net/veth8dccdf2e which isn't pci device
I0502 08:03:20.638040 1 file.go:239] [GetUniqueLock] get lock successfully
I0502 08:03:20.638069 1 agent.go:85] initializing "katalyst-agent-reporter"
W0502 08:03:20.638121 1 manager.go:400] failed to retrieve checkpoint for "reporter_manager_checkpoint": checkpoint is not found
I0502 08:03:20.638136 1 manager.go:258] registered plugin name system-reporter-plugin
I0502 08:03:20.638153 1 manager.go:239] plugin system-reporter-plugin run success
I0502 08:03:20.638171 1 manager.go:258] registered plugin name kubelet-reporter-plugin
I0502 08:03:20.638210 1 util_unix.go:104] "Using this endpoint is deprecated, please consider using full URL format" endpoint="/var/lib/kubelet/pod-resources/kubelet.sock" URL="unix:///var/lib/kubelet/pod-resources/kubelet.sock"
F0502 08:03:20.638341 1 kubeletplugin.go:110] run topology status adapter failed
What did you expect to happen?
All pods start normally
How can we reproduce it (as minimally and precisely as possible)?
None
Software version
Environment:
Kubernetes version (use kubectl version): 1.28
OS version: Ubuntu 22.04
Kernal version:
Cgroup driver: cgroupfs/systemd
/kind bug
It may have some errors when run topology status adapter , we have add some error messages in the fatal log https://github.com/kubewharf/katalyst-core/pull/573
It has been solved now. If there are still problems, I will reopen it.