The Alarm Context Tool (ACT) enhances AWS CloudWatch Alarms by providing additional context to aid in troubleshooting and analysis. By leveraging AWS services such as Lambda, CloudWatch, X-Ray, and Amazon Bedrock, this solution aggregates and analyzes metrics, logs, and traces to generate meaningful insights. Using generative AI capabilities from Amazon Bedrock, it summarizes findings, identifies potential root causes, and offers relevant documentation links to help operators resolve issues more efficiently. The implementation is designed for easy deployment and integration into existing observability pipelines, significantly reducing response times and improving root cause analysis.
- Dependencies
- Prerequisites
- Setup
- Deployment
- Usage
- Creating a New Handler
- Testing
- Environment Variables
- Available Functions
- Security
- License
- AWS CLI configured with appropriate permissions.
- Python 3.12 or later if you plan to use your IDE to detect problems in the code.
- AWS SAM CLI for deployment.
- Access to Anthropic Bedrock foundation models
- Supports Anthropic Claude Models:
- Anthropic Claude Instant v1.2
- Anthropic Claude 2 v2
- Anthropic Claude 2 v2.1
- Anthropic Claude 3 Sonnet
- Anthropic Claude 3 Haiku
- Anthropic Claude 3 Opus
- Supports Anthropic Claude Models:
- Verified identity in Amazon SES
-
Clone the repository:
git clone https://github.com/aws-samples/alarm-context-tool cd alarm-context-tool
-
Install dependencies if you plan to use your IDE to detect problems in the code:
pip install -r ./dependencies_layer/requirements.txt pip install aws_lambda_powertools
-
For some regions, you may need to change the layer version for Lambda Insights after the colon in template.yaml. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versionsx86-64.html.
- !Sub arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:49
-
Edit the template.yaml file with the recipient email address and sender address.
Resources:
AlarmContextFunction:
Type: AWS::Serverless::Function
Handler: lambda_function.alarm_handler
Runtime: python3.12
Environment:
Variables:
RECIPIENT: alias@domain.com
SENDER: Name <alias@domain.com>
-
Update additional Environment Variables if required
-
Update your SNS Topics that receive notifications from CloudWatch alarms:
- Protocol: AWS Lambda
- Endpoint: ARN of your Lambda function
-
Use a guided deployment to start with:
sam build sam deploy --guided
-
Subsequently, you can build, deploy and test using the following command: The test-event must be shared. See Testing
sam build; sam deploy --no-confirm-changeset; sam remote invoke --stack-name alarm-context-tool --region <aws-region> --test-event-name <test-event>
Once deployed, the Lambda function will be triggered by SNS topics subscribed to CloudWatch Alarms. The function will enhance the alarm message with additional context such as related metrics, logs, and traces. It uses Amazon Bedrock to analyze the gathered data and generate actionable insights.
To create a new handler for a different AWS service, follow these steps:
-
Create a new handler file: Create a new Python file in the
handlers
directory. For example,new_service_handler.py
. -
Define the handler function: Implement the handler function similar to existing handlers. Here's a template:
import boto3 import botocore from aws_lambda_powertools import Logger, Tracer logger = Logger() tracer = Tracer() @tracer.capture_method def process_new_service(dimensions, region, account_id, namespace, change_time, annotation_time, start_time, end_time, start, end): # Your implementation here pass
-
Add the handler to the Lambda function: Update
lambda_function.py
to import and call your new handler based on the trigger. -
Update the template: Modify
template.yaml
to include your new handler and update necessary permissions.Resources: AlarmContextFunction: Type: AWS::Serverless::Function Handler: lambda_function.alarm_handler Runtime: python3.12 Policies: - Statement: - Effect: Allow Action: - new-service:Describe* Resource: "*"
-
Add necessary permissions: Ensure that your new handler has the required permissions by updating the
template.yaml
file as shown above.
-
Trigger an Alarm: Manually trigger an alarm using the following command, replacing <alarm_name> with the name of your alarm:
aws cloudwatch set-alarm-state --state-value ALARM --state-reason "Testing" --alarm-name "<alarm_name>"
-
Use the test cases generated in the logs: The main Lambda function generates a test case that can be used in the Lambda console. See Testing Lambda functions in the console or by using
sam remote invoke
. -
Open the CloudWatch console
-
In the navigation pane, choose Logs, and then choose Logs Insights.
-
In the Select log group(s) drop down, choose /aws/lambda/alarm-context-tool-AlarmContextFunction-xxxxxxxxxxxx
-
Enter the following query, replacing <alarm_name> with the name of your alarm:
fields @timestamp, @message, @logStream, @log | filter message = "test_case" AND Records.0.Sns.Message like /<alarm_name>/
-
Choose Run query
-
Expand a log entry and copy the entire @message field.
-
You can then use this to test your Lambda function on demand.
The following environment variables can be configured for the Lambda function:
AWS_LAMBDA_LOG_LEVEL
: Sets the log level for AWS Lambda logs (e.g., INFO, DEBUG). Default isINFO
.ANTHROPIC_VERSION
: Specifies the version of the Anthropic model to be used. Default isbedrock-2023-05-31
.BEDROCK_MODEL_ID
: The ID of the Amazon Bedrock model to use. Default isanthropic.claude-3-sonnet-20240229-v1:0
.BEDROCK_REGION
: The AWS region where the Bedrock model is deployed. Default isus-east-1
.BEDROCK_MAX_TOKENS
: The maximum number of tokens to be used by the Bedrock model. Default is4000
.METRIC_ROUNDING_PRECISION_FOR_BEDROCK
: The precision for rounding metrics before sending to Bedrock. Default is3
.POWERTOOLS_LOG_LEVEL
: Sets the log level for AWS Lambda Powertools logs (e.g., INFO, DEBUG). Default isINFO
.POWERTOOLS_LOGGER_LOG_EVENT
: Enables logging of the full event in Lambda Powertools logs. Default isTrue
.POWERTOOLS_SERVICE_NAME
: The name of the service to be used in Lambda Powertools. Default isAlarm
.POWERTOOLS_TRACER_CAPTURE_RESPONSE
: Controls whether to capture the response in tracing. Default isFalse
.RECIPIENT
: The email address to receive notifications.SENDER
: The sender's email address for notifications.USE_BEDROCK
: Enables or disables the use of Amazon Bedrock for generative AI. Default isTrue
.
To configure these variables, update the template.yaml
file:
Resources:
AlarmContextFunction:
Type: AWS::Serverless::Function
Handler: lambda_function.alarm_handler
Runtime: python3.12
Environment:
Variables:
AWS_LAMBDA_LOG_LEVEL: INFO
ANTHROPIC_VERSION: bedrock-2023-05-31
BEDROCK_MODEL_ID: anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_REGION: us-east-1
BEDROCK_MAX_TOKENS: 4000
METRIC_ROUNDING_PRECISION_FOR_BEDROCK: 3
POWERTOOLS_LOG_LEVEL: INFO
POWERTOOLS_LOGGER_LOG_EVENT: "True"
POWERTOOLS_SERVICE_NAME: Alarm
POWERTOOLS_TRACER_CAPTURE_RESPONSE: "False"
RECIPIENT: alias@domain.com
SENDER: Name <alias@domain.com>
USE_BEDROCK: "True"
- get_log_insights_link(log_group_name, start_time, end_time, query)
- Generates a CloudWatch Logs Insights query link.
- Parameters:
log_group_name
(str): The name of the log group.start_time
(str): The start time for the query.end_time
(str): The end time for the query.query
(str): The Logs Insights query.
- build_dashboard(dashboard_metrics, annotation_time, start, end, region)
- Builds a dashboard with the specified metrics.
- Parameters:
dashboard_metrics
(list): The list of metrics for the dashboard.annotation_time
(str): The annotation time for the dashboard.start
(str): The start time for the dashboard.end
(str): The end time for the dashboard.region
(str): The AWS region.
- process_traces(trace_ids, start_time, end_time, region)
- Processes X-Ray traces and retrieves trace summaries and details.
- Parameters:
trace_ids
(list): The list of trace IDs to process.start_time
(str): The start time for the trace processing.end_time
(str): The end time for the trace processing.region
(str): The AWS region.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.