martins-jean / Real-time-data-processing-in-AWS

Leveraged AWS cloud services to create an anomaly detection system which allows maintenance teams to be alerted in real-time when wind farm sensors detect abnormally high wind speeds.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Real-time data processing in AWS

Contextual overview

The administration of a city would like to improve the efficiency of its wind farms and anticipate possible defects in wind turbines due to excessively high wind speeds. They would like to know when wind speeds get severe so that they can quickly alert maintenance teams and dispatch them where needed.

Architecture diagram

Screenshot 2023-09-13 at 22 29 06

Project objectives

1. To collect and process large streams of wind speed sensor data in real-time, we will use Kinesis Data Streams.
2. To analyze the data, we will use Kinesis Data Analytics for Apache Flink which is used with either Java or Scala.
3. To perform anomaly detection, we will use the Random Cut Forest (RCF) algorithm. The process will assign an anomaly score to each record based on values in the numeric columns. A record is an anomaly if it is distant from other records.
4. To configure an external destination, we will use a Lambda function. The function code will then take the processed data and parse it into records in an Amazon DynamoDB table. The data includes the wind farm location, wind speed and the assigned anomaly score.
5. To scan the DynamoDB table and filter for anomaly scores greater or equal to 2, we will use a second Lambda function. For each discovered anomaly, the function publishes a notification message to an SNS topic.
6. Subscribers to the SNS topic receive a notification email each time an anomaly is identified so the maintenance can be alerted and dispatched as soon as possible to the affected wind farm.

Reproducibility guidelines

Required setup
  1. Create a bucket in S3 for the Apache Flink application and use GitHub Desktop to upload the AnomalyDetection.jar file to it.
  2. Create an EC2 instance called "Wind Turbine Simulator" with a boto3 script that generates wind speed data.
  3. Create an IAM role for Kinesis Data Analytics.
  4. Create several AWS Lambda functions using the boto3 scripts I provided.
  5. Create a table in DynamoDB named WindDataTable.
  6. Create an AnomalyNotification topic in the SNS console.
Deploy a Kinesis Data Stream to ingest streaming data from the wind speed sensors
  1. Navigate to S3 and inside your kinesis-flink bucket, copy the name of the anomaly detection .jar file and paste it in a text editor.
  2. Navigate to the Amazon EC2 dashboard and click on instances (running) and copy the public IPv4 address of the EC2 instance you created earlier.
  3. In a new browser tab, paste the address and add /kinesis to it at the end. This opens the wind turbine data simulator.
  4. Navigate to Amazon Kinesis and create a provisioned Data Stream named "WindDataStream".
  5. Return to the Wind Turbine Data Simulator, type the name of your data stream and start sending the data.
  6. In the test data section, review that the data is being generated.
  7. Return to the data stream page and click on the data viewer option.
  8. Choose the only available shard, latest starting position and click get records. To view incoming data, click next records. If you don't see any records, wait for a few seconds and try again.
  9. Create another provisioned Data Stream named "AnomalyDetectionStream".
Create a Kinesis Data Analytics for Apache Flink application to process the incoming data
  1. On the Kinesis console, click Managed Apache Flink and then create a streaming application:
  • Name: AnomalyDetection.
  • Access to application resources: Choose from IAM roles that Kinesis Data Analytics can assume.
  • Service role: choose the IAM role you created earlier.
  • Templates: Development.

  1. At the top of the application page, click configure:
  • Amazon S3 bucket: click Browse and choose the kinesis-flink bucket you created earlier.

  • Path to S3 object: AnomalyDetection.jar.

  • Access to application resources: Choose from IAM roles that Kinesis Data Analytics can assume.

  • Service role: choose the IAM role you created earlier.

  • Under Runtime properties: click add item:

    • Group ID: project.
    • Key: inputStreamName.
    • Value: WindDataStream.

  • Add another item:

    • Group ID: project.
    • Key: ouputStreamName.
    • Value: AnomalyDetectionStream.

  • Add another item:

    • Group ID: project.
    • Key: region.
    • Value: us-east-1.
  • Click run to start the application with the latest snapshot.

  1. Return to the Wind Turbine Data Simulator and under "Wind Speed Data Set" click start and review to ensure data is being generated.
  2. Click on the AnomalyDetectionStream on the Kinesis page.
  3. Under data viewer, choose the only shard available, the latest starting position, get records and then next records to review the data.
  4. Start the "Wind Speed Anomaly Data Set" and review it to ensure the simulator is producing anomaly data.
Use a Lambda function to write application output data to a DynamoDB table
  1. Go to the AWS Lambda console and click on the AnalyticsDestinationFunction. The function accepts the wind data from analytics application destination stream in JSON format and parses it to store it in a DynamoDB table.
  2. In the function overview section, click add trigger:
  • Choose kinesis.
  • Select the AnomalyDetectionStream in the drop-down menu.
  • Review that "Activate trigger" is checked and click add.
  1. Navigate to the DynamoDB console and under tables, choose the WindDataTable and click explore table items.
  2. In the items returned section, click the expand option.
  3. In the information alert, click on retrieve next page.
  4. Click on the anomaly score column and you will see it in descending order. Review to ensure that three anomalies are listed at the top.
Use another Lambda function to filter the DynamoDB table for anomalies and publish them to an SNS topic
  1. Navigate to the SNS console and under topics, click on AnomalyNotification.
  2. Scroll to the subscriptions tab and create a subscription:
  • Protocol: email.
  • Endpoint: type a valid email address you can access. You will receive an email to confirm the subscription.
  1. Navigate to the Lambda console and click on the AnomalyMessageDeliveryFunction. This function runs a scan on the WindDataTable and filters the results according to the anomaly score. If the score is higher or equal to 2, it adds the location and wind speed for that item to an SNS message and publishes it to the AnomalyNotification SNS topic.
  2. To create a test event on the source code page, click Test:
  • Event name: AnomalyNotification.
  1. Click save, review the successful alert and click test again to see the results.
  2. Go to the email you listed and verify that you received three notifications.

About

Leveraged AWS cloud services to create an anomaly detection system which allows maintenance teams to be alerted in real-time when wind farm sensors detect abnormally high wind speeds.

License:MIT License


Languages

Language:Python 100.0%