forkfork / aws-msk-labs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Complete MSK workshop

Blogs that need to be incorporated

Pre-requisite

  • Two AWS Accounts
    • Account A = AWS MSK Cluster, Kafka Consumer and Producer EC2 instances, Cloud9 Bastion host, Prometheus and Grafana EC2 instance
    • Account B = AWS Glue Schema Registry Account (optional)

Let's get going

  • Login into Account A
    • Go to EC2 console and create a keypair to be able to ssh into Kafka and Prometheus EC2 instances that will be created during the workshop

    • Go to AWS MSK console and create a MSK Cluster Configuration

        * auto.create.topics.enable=true
        * delete.topic.enable=true
        * log.retention.hours=8
        * default.replication.factor=3
        * min.insync.replicas=2
        * num.io.threads=8
        * num.network.threads=5
        * num.partitions=1
        * num.replica.fetchers=2
        * replica.lag.time.max.ms=30000
        * socket.receive.buffer.bytes=102400
        * socket.request.max.bytes=104857600
        * socket.send.buffer.bytes=102400
        * unclean.leader.election.enable=true
        * zookeeper.session.timeout.ms=18000
      
    • Copy Cluster Configuration ARN in a notepad to use it in subsequent steps

    • Go to CloudFormation console

      • Create a VPC stack = VPC Stack
        • VPC with 1 public subnet and 3 private subnets + Nat Gateways
      • Create a Kafka clients stack = Kafka Clients Stack
        • 3 EC2 instances for a Kafka Producer, Consumer and Prometheus & Grafana
        • Kafka Producer & Consumer instances in separate Private subnets
        • Prometheus & Grafana instance in a Public subnet
        • Cloud9 environment to be used as a Bastion Host
        • CloudFormation parameters
          • GlueSchemaRegistryAccountId = you can use Account A id, if you don't have the second AWS Account
          • KeyName = EC2 keypair from Account A
          • VPCStackNAme = VPC Stack Name that you created prior to this stack
          • YouIPAddress = You can use https://checkip.amazonaws.com/ to find your laptop IP address
      • Create a MSK Cluster stack = AWS MSK Cluster Stack
        • CloudFormation parameters
          • ClusterConfigARN = AWS MSK Cluster Configuration ARN that you created in the previous step
          • ClusterConfigRevisionNumber = 1
          • KafkaClientStack = Kakfa Client CloudFormation Stack Name, stack that you created prior to this
          • MSKKafkaVersion = 2.7
          • PCAARN = Leave it blank
          • TLSMutualAuthentication = false
          • VPCStack = VPC CloudFormation Stack Name, the first stack that you created
    • Once the cluster is up and running proceed with the next steps


Cloud9 Bastion Host steps

  • Open Cloud 9 Terminal and upload ssh keypair file in cloud9 (upload local files), that you created in Account A

  • Change keypair.pem file permissions

      chmod 0400 <keypair.pem>
    
  • Create ssh scripts to be able to ssh into Prometheus, Producer and Consumer instances

    • producer.sh

        ssh -i <ec2 keypair.pem> ec2-user@<private ip of Producer EC2 instance>
      
    • consumer.sh

        ssh -i <ec2 keypair.pem> ec2-user@<private ip of Consumer EC2 instance>
      
    • prometheus.sh

        ssh -i <ec2 keypair.pem> ec2-user@<private ip of prometheus EC2 instance>
      
    • Make shell script executable

        chmod +x producer.sh prometheus.sh consumer.sh
      

Note: Wait for MSK Cluster stack to complete before proceeding to the next steps


Prometheus EC2 instance Basic Setup

  • From Cloud9 terminal ssh into Prometheus EC2 instance

      ./prometheus.sh
    
  • Once Sshed into Prometheus instance, configure region to ap-southeast-2

      aws configure set region ap-southeast-2       
    
  • Set the environment variables

    • CLUSTER_ARN = AWS MSK Cluster ARN
    • BS = AWS MSK Cluster Broker nodes endpoint
    • ZK = AWS MSK Cluster Zookeeper nodes endpoint
sed -i 's|HOME/bin|HOME/bin:~/kafka/bin|' .bash_profile
cat << 'EOF' >> .bash_profile
export CLUSTER_ARN=$(aws kafka list-clusters|grep ClusterArn|cut -d ':' -f 2-|cut -d ',' -f 1 | sed -e 's/\"//g')
export BS=$(aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerString|grep 9092| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//')
export ZK=$(aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectString|grep -v Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g')
EOF

source .bash_profile

# verify environment variables values
echo $CLUSTER_ARN
echo $BS
echo $ZK

Producer EC2 instance Basic Setup

  • From Cloud9 terminal ssh into Prometheus EC2 instance

      ./producer.sh
    
  • Once Sshed into Prometheus instance, configure region to ap-southeast-2

      aws configure set region ap-southeast-2
    
  • Set the environment variables, follow the commands/instructions given below bash_profile

    • CLUSTER_ARN = AWS MSK Cluster ARN
    • BS = AWS MSK Cluster Broker nodes endpoint
    • ZK = AWS MSK Cluster Zookeeper nodes endpoint
sed -i 's|HOME/bin|HOME/bin:~/kafka/bin|' .bash_profile
cat << 'EOF' >> .bash_profile
export CLUSTER_ARN=$(aws kafka list-clusters|grep ClusterArn|cut -d ':' -f 2-|cut -d ',' -f 1 | sed -e 's/\"//g')
export BS=$(aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerString|grep 9092| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//')
export ZK=$(aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectString|grep -v Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g')
EOF

source .bash_profile

# verify environment variables values
echo $CLUSTER_ARN
echo $BS
echo $ZK

Consumer EC2 instance Basic Setup

  • From Cloud9 terminal ssh into Prometheus EC2 instance

      ./consumer.sh
    
  • Once Sshed into Prometheus instance, configure region to ap-southeast-2

      aws configure set region ap-southeast-2
    
  • Set the environment variables, follow the commands/instructions given below bash_profile

    • CLUSTER_ARN = AWS MSK Cluster ARN
    • BS = AWS MSK Cluster Broker nodes endpoint
    • ZK = AWS MSK Cluster Zookeeper nodes endpoint
sed -i 's|HOME/bin|HOME/bin:~/kafka/bin|' .bash_profile
cat << 'EOF' >> .bash_profile
export CLUSTER_ARN=$(aws kafka list-clusters|grep ClusterArn|cut -d ':' -f 2-|cut -d ',' -f 1 | sed -e 's/\"//g')
export BS=$(aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerString|grep 9092| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//')
export ZK=$(aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectString|grep -v Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g')
EOF

source .bash_profile

# verify environment variables values
echo $CLUSTER_ARN
echo $BS
echo $ZK

Let's Produce & Consume Message to Kafka Topic for further use cases

Note: Start consumer first

  • Consume messages

    • ssh into Consumer EC2 instance from Cloud9 terminal

        ./consumer.sh
      
    • List existing topics in your MSK Cluster

      kafka-topics.sh --bootstrap-server $BS --list
      
    • Create topic workshop-topic

      kafka-topics.sh --bootstrap-server $BS --topic workshop-topic \
      --create --partitions 3 \
      --replication-factor 2
      
      kafka-topics.sh --bootstrap-server $BS \
      --topic workshop-topic --describe
      
    • Consume message

        kafka-console-consumer.sh --bootstrap-server $BS \
        --topic workshop-topic --group workshop-app \
        --from-beginning
      
  • Produce messages

    • ssh into Producer EC2 instance from Cloud9 terminal

        ./producer.sh
      
    • Generate / Produce traffic for workshop-topic topic

        kafka-producer-perf-test.sh \
        --producer-props bootstrap.servers="$BS" \
        acks=all --throughput 100 \
        --num-records 9999 \
        --topic workshop-topic \
        --record-size 1000
      
  • Kill Consumer that you started earlier before Producer finishes sending all the messages. We would like to keep the lag as we want to check the Consumer lag metrics later

    • In a consumer terminal

        CTRL + C
      
    • Get list of consumer groups

        kafka-consumer-groups.sh --bootstrap-server $BS --list
      
    • Describe workshop-app consumer group

        kafka-consumer-groups.sh --bootstrap-server $BS --describe --group workshop-app
      
    • Note the following to understand the consumer lag

      • CURRENT-OFFSET
      • LOG-END-OFFSET
      • LAG

Open Monitoring

Install Prometheus ==> on Prometheus EC2 Instance

  • Create user

          sudo useradd -M -r -s /bin/false prometheus
    
  • Create required directories

          sudo mkdir /etc/prometheus /var/lib/prometheus
    
  • Download prometheus tar

          wget https://github.com/prometheus/prometheus/releases/download/v2.29.1/prometheus-2.29.1.linux-amd64.tar.gz
    
          tar xvfz prometheus-2.29.1.linux-amd64.tar.gz
    
          rm prometheus-2.29.1.linux-amd64.tar.gz
    
  • Copy binary files to /usr/local/bin

          sudo cp prometheus-2.29.1.linux-amd64/{prometheus,promtool} /usr/local/bin/
    
  • Change ownership of Prometheus binary files to user prometheus

          sudo chown prometheus:prometheus /usr/local/bin/{prometheus,promtool}
    
          sudo cp -r prometheus-2.29.1.linux-amd64/{consoles,console_libraries} /etc/prometheus/
    
          sudo cp prometheus-2.29.1.linux-amd64/prometheus.yml /etc/prometheus/prometheus.yml
    
          sudo chown -R prometheus:prometheus /etc/prometheus
    
          sudo chown prometheus:prometheus /var/lib/prometheus
    
  • Run Prometheus and verify if it works

          prometheus --config.file=/etc/prometheus/prometheus.yml
    
  • Last line in the logs should be

          msg="Server is ready to receive web requests."
    
  • Stop prometheus server

          CTL+C
    
  • Configure Prometheus service in systemd

          sudo vi /etc/systemd/system/prometheus.service
    
  • Add the following content in prometheus.service

          [Unit]
          Description=Prometheus Time Series Collection and Processing Server
          Wants=network-online.target
          After=network-online.target
          
          [Service]
          User=prometheus
          Group=prometheus
          Type=Simple
          ExecStart=/usr/local/bin/prometheus \
          --config.file /etc/prometheus/prometheus.yml \
          --storage.tsdb.path /var/lib/prometheus/ \
          --web.console.templates=/etc/prometheus/consoles \
          --web.console.libraries=/etc/prometheus/console_libraries
          
          [Install]
          WantedBy=multi-user.target
    
  • Refresh the systemd content

          sudo systemctl daemon-reload
    
  • Start Prometheus service on the Prometheus EC2 instance

          sudo systemctl start prometheus
    
          sudo systemctl status prometheus
              
          ## Look for "Active: active (running)" to ensure service is configured and started correctly
    
  • Enable Prometheus service so that it starts automatically everytime Prometheus EC2 instance restarts

          sudo systemctl enable prometheus
    
  • Lets curl to check if prometheus service that we configured in the previous step is responding to curl commands

  • From the same terminal in Prometheus EC2 instance run the curl command

          curl localhost:9090
    
          You should see the following response <a href="/graph">Found</a>
    
  • Lets check from browser as well. Open browser on your Laptop and enter the following URL to see if you get Prometheus Expression Browser

            <public ip address of prometheus server>:9090
    
  • Enter up in the expression bar and click Execute. You should see

              up{instance="localhost:9090", job="prometheus"}    1
    
  • Let's change the configuration of Prometheus server and see if changes take effect

  • Update prometheus.yml

          sudo vi /etc/prometheus/prometheus.yml
    
  • Change the scrape frequency to 10s

  • Reloading the systemd configuration

          sudo killall -HUP prometheus
    
  • check prometheus config if scrape frequency changes are in place. From your Prometheus EC2 terminal window, run the following

          curl localhost:9090/api/v1/status/config
              
          you should see the following "scrape_timeout: 10s\n"
    

Install Node Exporter ==> on Prometheus EC2 instance

  • We are going to configure node exporter on this Prometheus EC2 instance to explore working of Prometheus set up on this EC2 instance.

  • We are still in Prometheus EC2 instance (sshed)

  • Add user node_exporter

          sudo useradd -M -r -s /bin/false node_exporter
    
  • Download node_exporter, move files to their appropriate locations and change ownership of directories/files

          wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
    
          tar xvfz node_exporter-1.2.2.linux-amd64.tar.gz
    
          rm node_exporter-1.2.2.linux-amd64.tar.gz
      
          sudo cp node_exporter-1.2.2.linux-amd64/node_exporter /usr/local/bin/
          
          sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
    
  • Configure Node Exporter service in systemd

          sudo vi /etc/systemd/system/node_exporter.service
    
  • Add the following content in node_exporter.service

          [Unit]
          Description=Prometheus Time Series Collection and Processing Server
          Wants=network-online.target
          After=network-online.target
    
          [Service]
          User=node_exporter
          Group=node_exporter
          Type=Simple
          ExecStart=/usr/local/bin/node_exporter
          
          [Install]
          WantedBy=multi-user.target
    
  • Refresh the systemd content

          sudo systemctl daemon-reload
    
  • Start Node Exporter service on the Prometheus EC2 instance

          sudo systemctl start node_exporter
    
          sudo systemctl status node_exporter
    
          ## Look for "Active: active (running)" to ensure service is configured and started correctly
    
  • Enable Node Exporter service so that it starts automatically everytime Prometheus EC2 instance restarts

          sudo systemctl enable node_exporter
    
  • Lets curl to check if Node Exporter service that we configured in the previous step is responding to curl commands

  • From the same terminal in Prometheus EC2 instance run the curl command

          curl localhost:9100/metrics
    

Configure Prometheus to get Metrics from Node Exporter

  • Update prometheus.yml file

          sudo vi /etc/prometheus/prometheus.yml
    
  • Add the following in prometheus.yml under scrape_configs:

          - job_name: "Prometheus Linux Server"
              static_configs:
              - targets: ["<private ip of prometheus server>:9100"]
    
  • Reload the Config changes

          sudo killall -HUP prometheus
    
  • Let's verify if Prometheus Server is being monitored. Open browser on your laptop and access Prometheus expression browser

          http://<public ip address of Prometheus Server>:9090/
    
  • In the query field

          up
    
  • Press "Execute"

          You should see the following results - First one shows Prometheus Application is up and running, Second one shows Node_exporter is up and running which is getting metrics for Prometheus EC2 instance
          
          up{instance="10.0.0.237:9090", job="prometheus"}                     1
          up{instance="10.0.0.237:9100", job="Prometheus Linux Server"}        1 
    
  • Awesome!! you have configured Prometheus instance and verified it's working with node_exporter installed locally on Prometheus EC2 instance

Let's get real!!

Get Node metrics and JMX metrics from Your MSK Cluster

  • We are still Prometheus EC2 instance (sshed)

  • Get Your MSK Cluster Brokers details

      echo $BS 
    
  • Copy Broker details and have them ready in the following format

    • 11001 is a port where JMX Exporter is running on MSK Brokers

    • 11002 is a port where Node Exporter is running on MSK brokers

        "b-1.mskcluster-msk.xxxxxxxxxxx.ap-southeast-2.amazonaws.com:11001",
        "b-2.mskcluster-msk.xxxxxxxxxxx.ap-southeast-2.amazonaws.com:11001",
        "b-3.mskcluster-msk.xxxxxxxxxxx.ap-southeast-2.amazonaws.com:11001"
      
        "b-1.mskcluster-msk.xxxxxxxxxxx.ap-southeast-2.amazonaws.com:11002",
        "b-2.mskcluster-msk.xxxxxxxxxxx.ap-southeast-2.amazonaws.com:11002",
        "b-3.mskcluster-msk.xxxxxxxxxxx.ap-southeast-2.amazonaws.com:11002"
      
  • Configure prometheus.yml file with MSK Brokers JMX and Node exporter details

          sudo vi /etc/prometheus/prometheus.yml
    
  • Add the following under scrape_configs

Note: replace broker details with your broker details (strings) that you prepared in the previous step

- job_name: "jmx"
  static_configs:
    - targets: [
      "b-1.mskcluster-msk.xxxx.c2.kafka.ap-southeast-2.amazonaws.com:11001",
      "b-2.mskcluster-msk.xxxx.c2.kafka.ap-southeast-2.amazonaws.com:11001",
      "b-3.mskcluster-msk.xxxx.c2.kafka.ap-southeast-2.amazonaws.com:11001"
      ]
- job_name: "node"
  static_configs:
    - targets: [
      "b-1.mskcluster-msk.xxxx.c2.kafka.ap-southeast-2.amazonaws.com:11002",
      "b-2.mskcluster-msk.xxxx.c2.kafka.ap-southeast-2.amazonaws.com:11002",
      "b-3.mskcluster-msk.xxxx.c2.kafka.ap-southeast-2.amazonaws.com:11002"
      ] 
  • Reload the config changes

          sudo killall -HUP prometheus
    
  • Access Prometheus expression browser and type up in the Expression query field and click Execute

      http://<public ip of prometheus server>:9090
    
      you should see the following results (first two are from the configuration on Prometheus EC2 instance, 
      rest 6 are for MSK Brokers, 3 each for JMX and Node Exporter
    
      up{instance="10.0.0.237:9090", job="prometheus"}                                                          1
      up{instance="10.0.0.237:9100", job="Prometheus Linux Server"}                                             1
      up{instance="b-1.mskcluster-msk.xxxxxx.c2.kafka.ap-southeast-2.amazonaws.com:11001", job="jmx"}           1
      up{instance="b-1.mskcluster-msk.xxxxxx.c2.kafka.ap-southeast-2.amazonaws.com:11002", job="node"}          1
      up{instance="b-2.mskcluster-msk.xxxxxx.c2.kafka.ap-southeast-2.amazonaws.com:11001", job="jmx"}           1
      up{instance="b-2.mskcluster-msk.xxxxxx.c2.kafka.ap-southeast-2.amazonaws.com:11002", job="node"}          1
      up{instance="b-3.mskcluster-msk.xxxxxx.c2.kafka.ap-southeast-2.amazonaws.com:11001", job="jmx"}           1
      up{instance="b-3.mskcluster-msk.xxxxxx.c2.kafka.ap-southeast-2.amazonaws.com:11002", job="node"}          1
    
  • Type the following in the Expression query field

      kafka_controller_KafkaController_Value{name="OfflinePartitionsCount"}
      
      OR
    
      kafka_consumer_group_ConsumerLagMetrics_Value
    

Configure Grafana Service - on Prometheus EC2 Instance

  • We are still in Prometheus EC2 instance terminal

  • Download and Install Grafana

        wget https://dl.grafana.com/oss/release/grafana-8.1.2-1.x86_64.rpm
          
        sudo yum install grafana-8.1.2-1.x86_64.rpm
    
  • Change Grafana config

        pushd /etc/grafana/
         
        sudo vim grafana.ini
          
        search for [live]
    
  • Add the following

        allowed_origins="*"
    
  • Reload changes and Start Grafana Service by running the following commands

        sudo systemctl daemon-reload
    
        sudo systemctl start grafana-server
    
        sudo systemctl status grafana-server
          # you should see the following Active: active (running)
    
        sudo systemctl enable grafana-server
    
  • Access grafana, open browser on your local machine/laptop and enter the following URL

        #Prometheus and Grafana are running on the same EC2 instance which is  
         named as __PrometheusInstance__ hence use PrometheusInstance EC2 instance public IP address
    
        <public-ip address of grafana server>:3000
    
  • You should get Grafana login page, use default username and password

        username = admin
        password = admin
    
  • On New Password screen, press Skip

  • Let's configure Grafana to fetch metrics from Prometheus

  • Add Prometheus datasource, Click cogwheel icon in left panel and click on data sources

  • Click Add data source and select Prometheus

  • In URL add

          http://<PRIVATE IP address of prometheus server>:9090 
    
  • Press Save and Test, you should see the following

          Data source is working
    
  • Create Kafka metrics Dashboard, download the following dashboard config json

          https://amazonmsk-labs.workshop.aws/en/openmonitoring/msk_grafana_dashboard.json
    
  • Click on four square icon and click on manage and import msk_grafana_dashboard.json

  • You should see Grafana Dashboard with nice graphs and metrics number on the top

  • Grafana and Prometheus Configuration is Working!! Metrics are being fetched from MSK Cluster.

  • Click on Refresh icon in the top right corner and select 5s. You should see Graphs moving!!

Configure your MSK cluster Cloudwatch Metrics on Grafana Dashboard

  • Create Cloudwatch datasource, Click cogwheel icon in left panel and click on data sources
  • Choose Data Source = Cloudwatch
    • Authentication Provider = AWS SDK Default (it will pick up EC2 instance role)

    • Default region = ap-southeast-2

      • Leave rest of the fields blank
    • Click Save & Test. You should see the following

        Data source is working
      
  • Add new panel (empty panel) to Kafka Dashboard that you created in the previous step
    • Select

    • Data source = Cloudwatch

    • Query Mode = Cloudwatch Metrics

    • Region = ap-southeast-2

    • Namespace = AWS/Kafka

    • Metric Name = MaxOffsetLag

    • Stats = Maximum

    • Add the following dimensions

    • Cluster Name = SELECT YOUR CLUSTER

      • Topic = SELECT TOPIC
      • Consumer Group = SELECT CONSUMER GROUP
    • OR

    • you can add the following in Expression

         SEARCH('{AWS/Kafka,"Cluster Name","Consumer Group",Partition,Topic} MSKCluster-MSK workshop-topic ml OffsetLag', 'Average', 300)
      

Configure Cruise Control

  • From Cloud9 terminal, ssh into Prometheus EC2 instance

      ./prometheus.sh
    
  • check java 8 installation

      sudo alternatives --config java
        
      # if Java 8 is already installed, no need to execute the the next steps (1,2,3,4) install which installs java 8
      1) sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
      2) sudo alternatives --config java
      3) sudo update-alternatives --config javac      
      4) select Java 8
    
  • Run the following to verify Java and Javac installations

      java -version
      javac -version
    
      both should be Java 8
    
  • Verify MSK kafka environment variables once again

      # check your environment variables once again
      echo $CLUSTER_ARN
      echo $BS
      echo $ZK
    
  • Download Cruise control

      cd ~
    
      wget https://github.com/linkedin/cruise-control/archive/2.5.22.tar.gz
    
      tar xvfz 2.5.22.tar.gz
      
      rm 2.5.22.tar.gz    
    
  • To build Cruise Control you must initialize it as a git repo

      cd cruise-control-2.5.22/
    
      git init && git add . && git commit -m "Init local repo." && git tag -a 2.0.130 -m "Init local version."
    
      #Build CruiseControl
      ./gradlew jar copyDependantLibs
    
  • Make changes in cruisecontrol.properties

    NOTE: 9090 is prometheus server port, 9091 is cruise control front end server port

      sed -i "s/localhost:9092/${BS}/g" config/cruisecontrol.properties
    
      sed -i "s/localhost:2181/${ZK}/g" config/cruisecontrol.properties
    
      sed -i "s/com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler/com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler/g" config/cruisecontrol.properties
    
      sed -i "s/webserver.http.port=9090/webserver.http.port=9091/g" config/cruisecontrol.properties
    
      sed -i "s/capacity.config.file=config\/capacityJBOD.json/capacity.config.file=config\/capacityCores.json/g" config/cruisecontrol.properties
    
      sed -i "s/two.step.verification.enabled=false/two.step.verification.enabled=true/g" config/cruisecontrol.properties
    
      echo "prometheus.server.endpoint=<PRIVATE IP ADDRESS OF PROMETHEUS EC2 INSTANCE>:9090" >> config/cruisecontrol.properties
      
      mkdir logs; touch logs/kafka-cruise-control.out  
    

    NOTE: Replace PRIVATE IP ADDRESS OF PROMETHEUS EC2 INSTANCE with Prometheus EC2 instance private IP address

  • Modify config/capacityCores.json for m5 large

      vi config/capacityCores.json
       
      {
          "brokerCapacities":[
              {
                  "brokerId": "-1",
                  "capacity": {
                      "DISK": "10737412742445",
                      "CPU": {"num.cores": "2"},
                      "NW_IN": "1073741824",
                      "NW_OUT": "1073741824"
                  },      
                  "doc": "This is the default capacity. Capacity unit used for disk is in MB, cpu is in number of cores, network throughput is in KB."
              }
          ]
      }
    

Install CruiseControl UI

  • We are still in Prometheus EC2 instance

      cd ~
      wget https://github.com/linkedin/cruise-control-ui/releases/download/v0.3.4/cruise-control-ui-0.3.4.tar.gz
    
      tar -xvzf cruise-control-ui-0.3.4.tar.gz
      rm cruise-control-ui-0.3.4.tar.gz
      mv cruise-control-ui cruise-control-2.5.22/
    
  • run cruise control

      cd cruise-control-2.5.22
      ./kafka-cruise-control-start.sh -daemon config/cruisecontrol.properties
    
  • check logs

      tail -f logs/kafkacruisecontrol.log
    

NOTE: Wait for few mins to let CruiseControl gather the data

  • Check if CruiseControl has able to create two topics in your MSK cluster

  • From another Cloud9 terminal, ssh into Producer EC2 instance

    ./producer.sh

  • Run the following to check list of topics in your MSK cluster

      kafka-topics.sh --bootstrap-server $BS --list
    
      You should see the following TWO CruiseControl topics
    
          __KafkaCruiseControlModelTrainingSamples
          __KafkaCruiseControlPartitionMetricSamples
    
  • Test Cruise control UI browser

      http://<PUBLIC IP ADDRESS of Prometheus EC2 Instance>:9091
    
  • Exit Log tailing and Curl few times

    • on Cloud9 terminal window where you sshed into Prometheus instance and started CruiseControl

        CTRL + C
      
        # Curl commands
      
        curl -X OPTIONS -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/kafka_cluster_state?json=true
        curl -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/kafka_cluster_state?json=true
        curl -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/load?allow_capacity_estimation=true&json=true
        curl -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/state?substates=EXECUTOR&verbose=true&json=true
        curl -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/proposals?verbose=true&json=true
        curl -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/user_tasks?json=true
        curl -v http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/kafka_cluster_state?json=true
        curl -v -X POST http://<PRIVATE IP of Prometheus EC2>:9091/kafkacruisecontrol/rebalance?dryrun=true&goals=RackAwareGoal%2CReplicaCapacityGoal%2CCpuCapacityGoal%2CDiskCapacityGoal%2CNetworkInboundCapacityGoal%2CNetworkOutboundCapacityGoal&json=true
      
  • Shutdown CruiseControl

    ./kafka-cruise-control-stop.sh


Security & Encryption

  • Currently, Amazon Managed Streaming for Apache Kafka (Amazon MSK) supports encryption in transit with TLS and TLS mutual authentication with TLS certificates for client authentication

  • Amazon MSK utilizes Amazon Certificate Manager Private Certificate Authority (ACM PCA) for TLS mutual authentication.

  • In addition, for Amazon MSK to be able to use the ACM PCA, it needs to be in the same AWS account as the Amazon MSK cluster.

  • However, the Apache Kafka clients, for example, the producers and consumers, schema registries, Kafka Connect or other Apache Kafka tools that need the end-entity certificates can be in an AWS account different from the AWS account that the ACM PCA is in.

  • In that scenario, in order to be able to access the ACM PCA, they need to assume a role in the account the ACM PCA is in and has the required permissions as the ACM PCA does not support resource-based policies, only identity-based policies.

Key concept

  • In-Transit Encryption
    • If encryption in-transit is enabled for an Amazon MSK cluster, Public TLS certificates from ACM are installed in the Amazon MSK Apache Kafka brokers keystores.
  • Mutual TLS
    • BROKERs Side
      • If TLS mutual authentication is enabled for the Amazon MSK cluster, you need to provide an ACM PCA Amazon Resource Number (ARN) that the Amazon MSK cluster can utilize. The CA certificate and the certificate chain of the specified PCA are retrieved and installed in the truststores of the Amazon MSK Apache Kafka brokers.
    • Kafka Clients (Producer/Consumer) Side
      • On the clients, you need to generate a Private Key and create a CSR (Certificate Signing Request) that are used to get end-entity certificates issued by the ACM PCA specified for an Amazon MSK cluster.
      • These certificates and their certificate chains are installed in the keystores on the client and are trusted by the Amazon MSK Apache Kafka brokers.

Let's Get Started

Pre-requisite

  • Two AWS Accounts
    • Account A = AWS MSK Cluster, Kafka Consumer and Producer EC2 instances, Cloud9 Bastion host, Prometheus and Grafana EC2 instance
    • Account B = AWS Glue Schema Registry (optional)

Let's get going

  • Login into Account A
    • Go to EC2 console and create a keypair to be able to ssh into Kafka and Prometheus EC2 instances that will be created during the workshop

    • Go to AWS MSK console and create a MSK Cluster Configuration

        * auto.create.topics.enable=true
        * delete.topic.enable=true
        * log.retention.hours=8
        * default.replication.factor=3
        * min.insync.replicas=2
        * num.io.threads=8
        * num.network.threads=5
        * num.partitions=1
        * num.replica.fetchers=2
        * replica.lag.time.max.ms=30000
        * socket.receive.buffer.bytes=102400
        * socket.request.max.bytes=104857600
        * socket.send.buffer.bytes=102400
        * unclean.leader.election.enable=true
        * zookeeper.session.timeout.ms=18000
      
    • Copy Cluster Configuration ARN in a notepad to use it in subsequent steps

    • Go to CloudFormation console

      • Create a VPC stack = VPC Stack
        • VPC with 1 public subnet and 3 private subnets + Nat Gateways
      • Create a Kafka clients stack = Kafka Clients Stack
        • 3 EC2 instances for a Kafka Producer, Consumer and Prometheus & Grafana
        • Kafka Producer & Consumer instances in separate Private subnets
        • Prometheus & Grafana instance in a Public subnet
        • Cloud9 environment to be used as a Bastion Host
        • CloudFormation parameters
          • GlueSchemaRegistryAccountId = you can use Account A id, if you don't have the second AWS Account
          • KeyName = EC2 keypair from Account A
          • VPCStackNAme = VPC Stack Name that you created prior to this stack
          • YouIPAddress = You can use https://checkip.amazonaws.com/ to find your laptop IP address

Setup AWS Certificate Manager (ACM) Private Certificate Authority (PCA)

  • Open Terminal in Cloud9 environment and ssh into Producer EC2 instance to run the following CLI command

    aws acm-pca create-certificate-authority \
    --certificate-authority-configuration '{"KeyAlgorithm":"RSA_2048","SigningAlgorithm":"SHA256WITHRSA","Subject":{"Country":"US","Organization":"Amazon","OrganizationalUnit":"AWS","State":"New York","CommonName":"MyMSKPCA","Locality":"New York City"}}' --revocation-configuration '{"CrlConfiguration":{"Enabled":false}}' --certificate-authority-type "ROOT" --idempotency-token 12345
    
  • Copy the ARN (Amazon Resource Name) of the PCA you just created to a notepad application.

  • Install a self-signed certificate in the ACM PCA just created. A certificate needs to be installed in the ACM PCA for the PCA to be able to issue and sign end-entity certificates.

    • Go to the AWS ACM Console.
    • Click on Private CAs that you created in the previous step, status should be "Pending Certificate"
    • Select the PCA you just created. Click on Actions dropdown and select Install CA Certificate
      • Don't change defaults, click Next and Confirm and Install

Create MSK Cluster

  • Create a MSK Cluster stack = AWS MSK Cluster Stack
    • CloudFormation parameters
      • ClusterConfigARN = AWS MSK Cluster Configuration ARN that you created in the previous step
      • ClusterConfigRevisionNumber = 1
      • KafkaClientStack = Kakfa Client CloudFormation Stack Name, stack that you created prior to this
      • MSKKafkaVersion = 2.7
      • PCAARN = PCA ARN that you copied in the previous step
      • TLSMutualAuthentication = True
      • VPCStack = VPC CloudFormation Stack Name, the first stack that you created
  • Once the cluster is up and running proceed with the next steps

Cloud9 Bastion Host steps

  • Open Cloud 9 Terminal and upload ssh keypair file in cloud9 (upload local files), that you created in Account A

  • Change keypair.pem file permissions

      chmod 0400 <keypair.pem>
    
  • Create ssh scripts to be able to ssh into Prometheus, Producer and Consumer instances

    • producer.sh

        ssh -i <ec2 keypair.pem> ec2-user@<private ip of Producer EC2 instance>
      
    • consumer.sh

        ssh -i <ec2 keypair.pem> ec2-user@<private ip of Consumer EC2 instance>
      
    • Make shell script executable

        chmod +x producer.sh consumer.sh    
      

Producer EC2 instance Basic Setup

  • From Cloud9 terminal ssh into Prometheus EC2 instance

      ./producer.sh
    
  • Once Sshed into Prometheus instance, configure region to ap-southeast-2

      aws configure        
    
  • Set the environment variables, follow the commands/instructions given below bash_profile

    • CLUSTER_ARN = AWS MSK Cluster ARN

    • BS = AWS MSK Cluster Broker nodes endpoint

    • BS_TLS = AWS MSK Cluster Broker nodes secure (TLS) endpoint

    • ZK = AWS MSK Cluster Zookeeper nodes endpoint

    • ZK_TLS = AWS MSK Cluster Zookeeper nodes secure (TLS) endpoint

        ~/ vim .bash_profile
      
        #append ~/kafka/bin to PATH
        #PATH variable should look something like
        
        PATH=$PATH:$HOME/.local/bin:$HOME/bin:~/kafka/bin
        export PATH
      
        export CLUSTER_ARN=`aws kafka list-clusters|grep ClusterArn|cut -d ':' -f 2-|cut -d ',' -f 1 | sed -e 's/\"//g'`
        
        export BS=`aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerString|grep 9092| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//'`
        
        export BS_TLS=`aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerStringTls|grep 9094| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//'`
        
        export ZK=`aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectString|grep -v Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g'`
        
        export ZK_TLS=`aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectStringTls|grep Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g'`
        
        # save changes and exit .bash_profile
      
        # load environment variables in profile
        ~/ source .bash_profile
      
        # verify environment variables values
        
        echo $CLUSTER_ARN
      
        echo $BS
      
        echo $BS_TLS
      
        echo $ZK
        
        echo $ZK_TLS
      
  • Producer Truststore for encryption in-transit

    • As part of the set up, MSK Client CloudFormation stack has created a copy of java trust store for your producer to be able to communicate with your MSK cluster over TLS (9094 port)

    • Let's verify if trust store exists or not.

    • ssh into Producer EC2 instance from your Cloud9 terminal

        ./producer.sh
        
        pusdh /tmp
        
        ll              # you should see kafka.client.truststore.jks a file in there.
        ----------------------------------------------------------------------------------------------------------------------------------------        
        # kafka.client.truststore.jks is a copy of trust store that exists within the java virtual machine (jvm) installed on Producer EC2 instance
        # MSK Clients CloudFormation copied the cacerts file from JVM on Producer EC2 instance to /tmp directory in kafka.client.truststore.jks file
      
        # This is the command that we have in the CloudFormation MSK Client stack
        # Initialize the Kafka cert trust store
        su -c 'find /usr/lib/jvm/ -name "cacerts" -exec cp {} /tmp/kafka.client.truststore.jks \;' -s /bin/sh ec2-user
        
        NOTE: You don't need to rerun this again, this is just for your knowledge. This command is copying "cacert" file from installed 
              Java to "kafka.client.truststore.jks" /tmp folder
        ----------------------------------------------------------------------------------------------------------------------------------------
        
        popd
        pwd      
        # you should be back in /home/ec2-user
      
    • Any truststore that trusts Amazon Trust Services also trusts the certificates of Amazon MSK brokers

    • Additional reading if you want to

    • MSK Client CLoudFormation template has also downloaded AuthMSK-1.0-SNAPSHOT.jar to make it easy for you to generate Key and a Certificate for your Producer and Consumer client for Mutual TLS Authemtication. We will cover this later.

        cd ~
        ll
        # you should see AuthMSK-1.0-SNAPSHOT.jar
      
    • The sample code is available at github. github


Consumer EC2 instance Basic Setup

  • From Cloud9 terminal ssh into Prometheus EC2 instance

      ./consumer.sh
    
  • Once Sshed into Prometheus instance, configure region to ap-southeast-2

      aws configure        
    
  • Set the environment variables, follow the commands/instructions given below bash_profile

    • CLUSTER_ARN = AWS MSK Cluster ARN

    • BS = AWS MSK Cluster Broker nodes endpoint

    • BS_TLS = AWS MSK Cluster Broker nodes secure (TLS) endpoint

    • ZK = AWS MSK Cluster Zookeeper nodes endpoint

    • ZK_TLS = AWS MSK Cluster Zookeeper nodes secure (TLS) endpoint

        ~/ vim .bash_profile
        
        #append ~/kafka/bin to PATH
        #PATH variable should look something like
        
        PATH=$PATH:$HOME/.local/bin:$HOME/bin:~/kafka/bin
        export PATH
      
        export CLUSTER_ARN=`aws kafka list-clusters|grep ClusterArn|cut -d ':' -f 2-|cut -d ',' -f 1 | sed -e 's/\"//g'`
        
        export BS=`aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerString|grep 9092| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//'`
        
        export BS_TLS=`aws kafka get-bootstrap-brokers --cluster-arn $CLUSTER_ARN|grep BootstrapBrokerStringTls|grep 9094| cut -d ':' -f 2- | sed -e 's/\"//g' | sed -e 's/,$//'`
        
        export ZK=`aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectString|grep -v Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g'`
        
        export ZK_TLS=`aws kafka describe-cluster --cluster-arn $CLUSTER_ARN|grep ZookeeperConnectStringTls|grep Tls|cut -d ':' -f 2-|sed 's/,$//g'|sed -e 's/\"//g'`
      
        # save changes and exit .bash_profile
      
        # load environment variables in profile
        ~/ source .bash_profile
      
        # verify environment variables values
         
        echo $CLUSTER_ARN
      
        echo $BS
      
        echo $BS_TLS
      
        echo $ZK
        
        echo $ZK_TLS
      

## Communicate with MSK Brokers over TLS * From Cloud9 terminal, ssh into your Producer EC2 instance
    ./producer.sh
    
    #verify if kafka cli commands are working. type the following command and press enter. It should print the options
    kafka-topics.sh
  • Communicate with your MSK cluster over PALINTEXT and list existing topics:

      kafka-topics.sh --bootstrap-server $BS --list
      
      #you should see the following topics
      __amazon_msk_canary
      __amazon_msk_canary_state
      __consumer_offsets
    
  • Communicate with your MSK cluster over TLS and list existing topics:

      cd ~
    
      vim client.properties
    
      #Add the following content
    
      security.protocol=SSL
      ssl.truststore.location=/tmp/kafka.client.truststore.jks
      ssl.truststore.password=changeit
      
      kafka-topics.sh --bootstrap-server $BS --list --command-config client.properties
    

!! WORK IN PROGRESS

aws kafka list-clusters | jq ".ClusterInfoList[0].ZookeeperConnectString"

About