opensearch-project / opensearch-ci

Enables continuous integration across OpenSearch, OpenSearch Dashboards, and plugins.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Remove alarm `MainNodeCloudwatchEvents`.

prudhvigodithi opened this issue · comments

Describe the bug

The alarm added MainNodeCloudwatchEvents as part of the CI monitoring setup is suppose to alert when the cloudwatch agent failed to send data. But the way how is it configured is not just check the mem_used_percent which is not right instead it should look for something called FailedInvocations.

The other way to look is, the other existing alarms MainNodeJenkinsProcessNotFound , AverageMainNodeCpuUtilization etc configured are aslo dependent on cloudwatch agent, they get the metrics from working cloudwatch process, so if cloudwatch agent failed then these alerts should also not work, so not required to explicitly create an alarm from monitoring cloudwatch process.

To reproduce

N/A

Expected behavior

No response

Screenshots

If applicable, add screenshots to help explain your problem.

Host / Environment

No response

Additional context

No response

Relevant log output

No response

Closing this issue as not required to remove the misconfigured alarm MainNodeCloudwatchEvents at this time.