jeff1evesque / kinesis-analytics-demo

kinesis analytics streaming demonstration

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting Started

Install PyCharm

  1. First, install miniconda for Python 3.8

  2. Then, create and activate a kinesis_analytics environment:

conda create --name kinesis_analytics
conda init
conda activate kinesis_analytics
  1. Go to the Jetbrains Website to download--use the Community edition because it's free!

  2. Select your operating system and download it, then run the executable to install.

  3. Open Pycharm

Set up Pycharm with Conda

  1. Clone this GitHub repository locally

  2. When you first open up the IDE, select New Project, then set the Location to the local path of the git repository

Pycharm New Project

AWS Resources for local development

  1. Go into your AWS account and create an input Kinesis Data Stream and an Output Kinesis Data Stream--they can be one shard each, this is fine!

  1. Ensure you have the proper permissions to read / write to these streams through your IAM user locally. If not, you can use a local Kinesis engine like Kinesalite to simulate this.

  2. If the application is to run locally, ensure the corresponding IAM user running datagen/stock.py has the following permission:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "kinesis:DescribeStream",
            "kinesis:PutRecord",
            "kinesis:PutRecords",
            "kinesis:GetShardIterator",
            "kinesis:GetRecords",
            "kinesis:ListShards",
            "kinesis:DescribeStreamSummary",
            "kinesis:RegisterStreamConsumer"
        ],
        "Resource": "*"
    }]
}

Note: for best practices, the above permission should not be permanently added.

Setup local environment

  1. Copy flink/application_properties.json.replace as flink/application_properties.json, change the input.stream.name to be the input kinesis stream name, and optionally remove the producer configuration in the same file, then Hit save.

  2. Next, click flink/sliding_window.py, then right click within the code and click Modify Run Configuration.

This will open a dialog box where we can define our IS_LOCAL environment variable. We need this because I've written the script to use a local application_properties.json file if this switch is enabled! Without this, application will be looking in /etc/flink/application_properties.json which is where KDA places it.

Click on the icon next to Environment Variables which will open up another dialog box. In this one, on the bottom right you will see a + sign. Use this to add IS_LOCAL environment variables.

Hit OK, then Apply and OK.

  1. Ensure that java is installed locally:
brew install java11
sudo ln -sfn /usr/local/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk
  1. Download and set up your Kinesis Connector.

Ensure a flink-sql-connector jar file is placed in the flink/ directory. It's the Amazon Kinesis SQL Connector for Flink. This will need to be bundled with your application on deploy (already automated within associated CodeBuild), and needs to match the jarfile in application_properties.json. Lastly, ensure the jar file is in the .gitignore because adding jar files to git is not a good practice.

Run Sliding Window

  1. To implement Kinesis (source) to Kinesis (sink), replace create_print_table invocation from the main function with create_table. For local development, no further changes are required.

  1. Now, right click into the code (i.e. sliding_window.py) and hit Run 'sliding_window' to start the code execution.

  2. Finally, send data to the source Kinesis Data Stream. A sample datagen/stock.py has been provided in this project, and needs to be executed.

After a few seconds of sending data, you should see the print statements come through the console of the IDE in the sliding_window tab.

/Users/jeff1evesque/opt/miniconda3/envs/kinesis-analytics-demo/bin/python /Users/jeff1evesque/application/kinesis-analytics-demo/flink/sliding_window.py
is_local: True

Source Schema
(
  `ticker` VARCHAR(6),
  `price` DOUBLE,
  `utc` TIMESTAMP(3) *ROWTIME*,
  WATERMARK FOR `utc`: TIMESTAMP(3) AS `utc` - INTERVAL '20' SECOND
)

Sink Schema
(
  `ticker` VARCHAR(6),
  `price` DOUBLE,
  `utc` TIMESTAMP(3) *ROWTIME*,
  WATERMARK FOR `utc`: TIMESTAMP(3) AS `utc` - INTERVAL '20' SECOND
)
sliding_window_over: 2.minutes
sliding_window_every: 1.minutes
sliding_window_on: utc

sliding_window_table
(
  `ticker` VARCHAR(6),
  `price` DOUBLE,
  `utc` TIMESTAMP(3)
)

creating temporary view for sliding window table to access within SQL
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner (file:/Users/jeff1evesque/opt/miniconda3/envs/kinesis-analytics-demo/lib/python3.8/site-packages/pyflink/lib/flink-dist_2.11-1.13.2.jar) to field java.util.Collections$SingletonList.serialVersionUID
WARNING: Please consider reporting this to the maintainers of org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
+I[AMZN, 0.75, 2022-06-06T19:30]
+I[TSLA, 0.59, 2022-06-06T19:30]
+I[AAPL, 0.52, 2022-06-06T19:30]
+I[MSFT, 0.26, 2022-06-06T19:30]
+I[AMZN, 0.26, 2022-06-06T19:31]
+I[MSFT, 0.26, 2022-06-06T19:31]
+I[TSLA, 0.26, 2022-06-06T19:31]
+I[AAPL, 0.01, 2022-06-06T19:31]
+I[AMZN, 0.11, 2022-06-06T19:32]
+I[MSFT, 0.17, 2022-06-06T19:32]
+I[AAPL, 0.01, 2022-06-06T19:32]
+I[TSLA, 0.03, 2022-06-06T19:32]
+I[TSLA, 0.03, 2022-06-06T19:33]
+I[AAPL, 0.06, 2022-06-06T19:33]
+I[AMZN, 0.02, 2022-06-06T19:33]
+I[MSFT, 0.16, 2022-06-06T19:33]
+I[AAPL, 0.06, 2022-06-06T19:34]
+I[MSFT, 0.01, 2022-06-06T19:34]
+I[TSLA, 0.03, 2022-06-06T19:34]
+I[AMZN, 0.02, 2022-06-06T19:34]
+I[MSFT, 0.01, 2022-06-06T19:35]
+I[AAPL, 0.18, 2022-06-06T19:35]
+I[TSLA, 0.03, 2022-06-06T19:35]
+I[AMZN, 0.03, 2022-06-06T19:35]

Run Tumbling Window

  1. Now, right click into the code (i.e. tumbling_window.py) and hit Run 'tumbling_window' to start the code execution.

  2. Finally, send data to the source Kinesis Data Stream. A sample datagen/stock.py has been provided in this project, and needs to be executed.

After a few seconds of sending data, you should see the print statements come through the console of the IDE in the tumbling_window tab.

/Users/jeff1evesque/opt/miniconda3/envs/kinesis-analytics-demo/bin/python /Users/jeff1evesque/application/kinesis-analytics-demo/flink/sliding_window.py
is_local: True

Source Schema
(
  `ticker` VARCHAR(6),
  `price` DOUBLE,
  `utc` TIMESTAMP(3) *ROWTIME*,
  WATERMARK FOR `utc`: TIMESTAMP(3) AS `utc` - INTERVAL '20' SECOND
)

Sink Schema
(
  `ticker` VARCHAR(6),
  `window_start` TIMESTAMP(3),
  `window_end` TIMESTAMP(3),
  `first_price` DOUBLE,
  `last_price` DOUBLE,
  `min_price` DOUBLE,
  `max_price` DOUBLE
)
tumbling_window_over: '8' HOURS
tumbling_window_every: '1' MINUTE
tumbling_window_on: utc

tumbling_window_table
(
  `ticker` VARCHAR(6),
  `window_start` TIMESTAMP(3) *ROWTIME*,
  `window_end` TIMESTAMP(3) *ROWTIME*,
  `first_price` DOUBLE,
  `last_price` DOUBLE,
  `min_price` DOUBLE,
  `max_price` DOUBLE
)

creating temporary view for sliding window table to access within SQL
+I[AMZN, 2022-07-20T20:56, 2022-07-20T20:57, 82.64, 34.95, 0.05, 99.81]
+I[TSLA, 2022-07-20T20:56, 2022-07-20T20:57, 54.89, 93.62, 0.11, 99.91]
+I[MSFT, 2022-07-20T20:56, 2022-07-20T20:57, 43.12, 76.65, 0.69, 99.79]
+I[AAPL, 2022-07-20T20:56, 2022-07-20T20:57, 65.29, 93.06, 0.0, 99.71]
+I[AAPL, 2022-07-20T20:57, 2022-07-20T20:58, 9.86, 10.97, 0.25, 99.94]
+I[MSFT, 2022-07-20T20:57, 2022-07-20T20:58, 80.06, 64.48, 0.01, 99.86]
+I[AMZN, 2022-07-20T20:57, 2022-07-20T20:58, 30.36, 37.71, 0.62, 99.97]
+I[TSLA, 2022-07-20T20:57, 2022-07-20T20:58, 84.05, 38.65, 0.02, 100.0]
+I[MSFT, 2022-07-20T20:58, 2022-07-20T20:59, 48.8, 39.57, 0.2, 99.89]
+I[TSLA, 2022-07-20T20:58, 2022-07-20T20:59, 25.3, 82.68, 0.15, 99.93]
+I[AAPL, 2022-07-20T20:58, 2022-07-20T20:59, 15.78, 86.46, 0.12, 99.98]
+I[AMZN, 2022-07-20T20:58, 2022-07-20T20:59, 15.77, 10.04, 0.22, 99.96]
+I[AMZN, 2022-07-20T20:59, 2022-07-20T21:00, 12.63, 46.2, 0.06, 99.75]
+I[MSFT, 2022-07-20T20:59, 2022-07-20T21:00, 36.7, 8.63, 0.1, 99.95]
+I[AAPL, 2022-07-20T20:59, 2022-07-20T21:00, 91.41, 44.91, 0.01, 99.55]
+I[TSLA, 2022-07-20T20:59, 2022-07-20T21:00, 94.23, 6.09, 0.36, 99.81]

About

kinesis analytics streaming demonstration


Languages

Language:Python 100.0%