k8snetworkplumbingwg / sriov-network-operator

Operator for provisioning and configuring SR-IOV CNI plugin and device plugin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve the `config-daemon` observability with k8s `v1.Events`

zeeke opened this issue · comments

When debugging issues in large clusters, it can be difficult to understand what went wrong with the config daemon by reading the logs only, as they can get lost after config-daemon restarts.

The idea here is to leverage kubernetes v1.Event objects as described in the doc to mark some important events of the operator lifecycle, such as:
a. The request of the operator to drain a node
b. A node reboot requested by the config-daemon
c. Each transition of any SriovNetworkNodeState status (e.g., Succeeded -> InProgress)

I think this would be a good improvement.

I suggest in addition to the mentioned events we could have also config-daemon start/stop events.

@zeeke can you assign this issue to me?

I suggest in addition to the mentioned events we could have also config-daemon start/stop events.

It sounds good to me. I'll also discuss this issue in the next community meeting