mikelo / cp4waiops-deployer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CP4WatsonAIOps CP4WAIOPS v3.7.1

Demo Environment Installation ๐Ÿš€

K8s CNI

ยฉ2023 Niklaus Hirt / IBM

โ— This is provided as-is:

  • I'm sure there are errors
  • I'm sure it's not complete
  • It clearly can be improved

Please contact me if you have feedback or if you find glitches or problems.

โ—The installation has been tested for the CP4WAIOPS v3.7.1 release on OpenShift 4.12 on:

  • OpenShift Cluster (VMware on IBM Cloud) - IPI
  • OpenShift Cluster (VMware on IBM Cloud) - UPI
  • IBM RedHat Openshift Kubernetes Service (ROKS) with IBM Cloud Storage (ibmc-xxx)

But it should work on other Openshift Platforms as well

โ—Those are non-production installations and are suited only for demo and PoC environments. โ— Please refer to the official IBM Documentation for production ready installations.


๐Ÿš€ Quickstart


๐Ÿฅ Quick Install

  • Get an OpenShift Cluster
  • Get your entitlement key/pull token
  • Paste the install file into the OpenShift web UI and insert your entitlement key
  • Grab a coffe and come back after 2-3 hours depending on the modules you're installing

๐Ÿš€ Demo the Solution

๐Ÿ“ฑ Slack integration (optional)

๐Ÿค“ Demo Setup - Explained

โ„น๏ธ You can find a more detailed presentation about how the automation works here: PDF.

Here is a quick video that walks you through the installation process K8s CNI

โš ๏ธ This method creates an in-cluster installation

  • It's way faster
  • You don't have to install all the tooling locally
  • You donโ€™t need a connection to the cluster during the installation (fire and forget)

๐Ÿค“ So this could basically be done from an iPhone or iPad

๐Ÿš€ Already have a cluster? Dive right in


1. Preparation


โœ… Prerequisites

1.1 Prerequisites

1.1.1 OpenShift requirements

I installed the demo in a Techzone environment.

You'll need:

  • ROKS 4.12
  • 5x worker nodes with 32 CPU / 128 GB โ—

You might get away with less if you don't install some components (Event Manager, ELK, Turbonomic,...) but no guarantee.

1.1.2 Get an OpenShift Cluster (IBMers and IBM Partners only)

IBMers and Partners can get a temporary cluster from Techzone.

  1. Select OpenShift Cluster (VMware on IBM Cloud) - IPI - Public Openshift that you want to use above

  2. Create a cluster for Practice/Self Education or Test if you don't have an Opportunity Number

  3. Select your preferred Geograpy

K8s CNI

  1. Select the maximum end date that fits your needs (you can extend the duration once after creation)

    K8s CNI

  2. Select Openshift Storage

    • Storage OCS/ODF Size: 5TiB - This is important as with the 500MiB option your installation will run out of space rapidly.

    • OpenShift Version: 4.12

    K8s CNI

  3. Select the Cluster Size

    • Worker node count: 5
    • Flavour: 32 vCPU X 128 GB โ—

    โ— If you want to install CP4WAIOps, Trubonomic and Instana please select 32 vCPU X 128 GB

    K8s CNI

  4. Click Submit

  5. Once the cluster is provisioned, don't forget to extend it to 8 days if needed.

1.1.3 Get the CP4WAIOPS installation token (registry pull token)

You can get the installation (registry pull token) token from https://myibm.ibm.com/products-services/containerlibrary.

This allows the CP4WAIOPS images to be pulled from the IBM Container Registry.

โš ๏ธ Important remarks before you start

โš ๏ธโš ๏ธ 1.2 Important remarks before you start โš ๏ธโš ๏ธ

Those are remarks to feedback and problem reports I got from the field.

Those scripts have been tested thoroughly on different environments and have proven to be VERY reliable.

If you think that you hit a problem:

  • Make sure that you have provisioned a cluster with 5 worker nodes with 32 CPU and 128 GB each (b3c.16x64 - it's easy to select the wrong size). If you have Pods in 0/0 state verify the Events. If you get Not enough CPU then delete the cluster and provision the correct size.
  • If you want to install CP4WAIOps, Turbonomic and Instana please select 5 worker nodes with 32 CPU and 128 GB
  • The complete installation takes about 2.5 to 8 hours depending on your region where and how you deployed ROKS to (see above).
  • If you see Pods in CrashLoop or other error states, try to wait it out (this can be due to dependencies on other componenets that are not ready yet). Chances are that the deployment will eventually go through. If after 8h you are still stuck, ping me.
  • Select and use ONLY ONE of the scripts below, depending on which components you want to install.

โ— So simply put be patient and make sure you have the correct size of cluster provisioned!


2. Quick Install


You can use the scritps in the Quick_Install folder to rapidly spin up a demo environment. The names should be self explaining and the headers of the files explain the modules to be installed.

You can find some examples below.

โœ… Install CP4WAIOps with demo content, Turbonomic and Instana

2.1 Install CP4WAIOps with demo content, Turbonomic and Instana

โœ… This is probably the one that you want.

You get CP4WAIOps installed and pre-trained in one simple script. Ready to go. On top of that you get Turbonomic and Instana instances to play around a bit (you'll need a license key for each).

K8s CNI

  1. In the the OCP Web UI click on the + sign in the right upper corner
  2. Copy and paste the content from this file
  3. Replace <REGISTRY_TOKEN> at the end of the file with your pull token from step 1.1.3 (the Entitlement key from https://myibm.ibm.com)
  4. Replace <YOUR_SALES_KEY> and <YOUR_AGENT_KEY> at the end of the file with your Instana license if you have one
  5. Replace the TURBO_LICENSE NONE at the end of the file with your Turbonomic license if you have one
  6. Click Save

โ„น๏ธโ— If you get a ClusterRoleBinding already exists, just ignore it

๐Ÿš€ You can now go to Demo the Solution

This installation cocntains:

  • CP4WAIOps
    • IBM Operator
    • CP4WAIOps Instance
  • CP4WAIOps Demo Content
    • OpenLDAP & Register with CP4WAIOps

    • AWX (Open Source Ansible Tower) with preloaded Playbooks

    • AI Models - Load and Train

      • Create Training Definitions (TG, LAD, CR, SI. Turn off RSA)
      • Create Training Data (LAD, SNOW)
      • Train Models (TG, LAD, CR, SI)
    • Topology

      • RobotShop Demo App
      • Create K8s Observer
      • Create ASM merge rules
      • Load Overlay Topology
      • Create CP4WAIOps Application
    • Misc

      • Creates valid certificate for Ingress (Slack)
      • External Routes (Flink, Topology, ...)
      • Disables ASM Service match rule
      • Create Policy Creation for Stories and Runbooks
      • Demo Service Account
  • Turbonomic
  • Turbonomic Demo Content - Demo User - RobotShop Demo App with synthetic metric - Instana target (if Instana is installed - you have to enter the API Token Manually) - Groups for vCenter and RobotShop - Groups for licensing - Resource Hogs
  • Instana
โ˜‘๏ธ Install CP4WAIOps, Event Manager with demo content

2.2 Install CP4WAIOps, Event Manager with demo content

You get all the CP4WAIOPS components installed and pre-trained in one simple script. Ready to go.

K8s CNI

  1. In the the OCP Web UI click on the + sign in the right upper corner
  2. Copy and paste the content from this file
  3. Replace <REGISTRY_TOKEN> at the end of the file with your pull token from step 1.1.3 (the Entitlement key from https://myibm.ibm.com)
  4. Click Save

โ„น๏ธโ— If you get a ClusterRoleBinding already exists, just ignore it

๐Ÿš€ You can now go to Demo the Solution

This installation cocntains:

  • CP4WAIOps
    • IBM Operator
    • CP4WAIOps Instance
  • CP4WAIOps Demo Content
    • OpenLDAP & Register with CP4WAIOps

    • AWX (Open Source Ansible Tower) with preloaded Playbooks

    • AI Models - Load and Train

      • Create Training Definitions (TG, LAD, CR, SI. Turn off RSA)
      • Create Training Data (LAD, SNOW)
      • Train Models (TG, LAD, CR, SI)
    • Topology

      • RobotShop Demo App
      • Create K8s Observer
      • Create ASM merge rules
      • Load Overlay Topology
      • Create CP4WAIOps Application
    • Misc

      • Creates valid certificate for Ingress (Slack)
      • External Routes (Flink, Topology, ...)
      • Disables ASM Service match rule
      • Create Policy Creation for Stories and Runbooks
      • Demo Service Account
  • Event Manager
    • Event Manager
  • Event Manager Demo Content
    • Topology
      • Create ASM merge rules
      • Load ASM merge Topology
      • Create CP4WAIOps Application
โ˜‘๏ธ Custom Install

2.3 Custom Install

  1. Open the 00_INSTALL_CUSTOM.yaml file

  2. Adap the installation configuration to your needs. Select the modules to install and their configuration.

    - name: cp4waiops
      kind: CP4WAIOps						<-- The feature to be configured
      install: true							<-- Install yes/no
    
      # current_cp4waiops_feature			<-- Configuration of the feature
      # CP4WAIOPS Size of the install
      waiops_size: small
      ...
      # Version of the catalog subscription
      subscription_channel: v3.7
    
  3. Replace <REGISTRY_TOKEN> at the end of the file with your pull token from step 1.1.3 (the Entitlement key from https://myibm.ibm.com)

  4. Copy the content

  5. In the the OCP Web UI click on the + sign in the right upper corner

  6. Paste the content

  7. Click Save

โ„น๏ธโ— If you get a ClusterRoleBinding already exists, just ignore it

๐Ÿš€ You can now go to Demo the Solution


3. Demo the Solution


๐Ÿ“น Please use the Demo Script to prepare for the demo.

๐Ÿ“น I have also added a short Demo Walkthrough video that you can watch to get an idea on how to do the demo. This is based on 3.2 and the Click Through PPT, but should work more or less with your own instance.

๐ŸŒ Access the Environment

3.1 Access the Environment

To access the demo environment:

  • Click on the Application Menu in your Openshift Web Console.

  • Select CP4WAIOps Demo UI

  • Login with the password P4ssw0rd!

    demo

๐Ÿ” Login to CP4WAIOps as demo User

3.2 Login to CP4WAIOps as demo User

  • Click on the blue CP4WAIOps button
  • Login as User demo with the Password P4ssw0rd!

demo

๐Ÿš€ Demo the Solution

3.3 ๐Ÿš€ Demo the Solution

Please use the Script to prepare for the demo.

Then start the demo with the Demo Script.


4. Slack integration


For the system to work you need to follow those steps:

  1. Create Slack Workspace
  2. Create Slack App
  3. Create Slack Channels
  4. Create Slack Integration
  5. Get the Integration URL
  6. Create Slack App Communications
  7. Slack Reset
๐Ÿ“ฅ Detailed Instructions

4.1 Create your Slack Workspace

  1. Create a Slack workspace by going to https://slack.com/get-started#/createnew and logging in with an email which is not your IBM email. Your IBM email is part of the IBM Slack enterprise account and you will not be able to create an independent Slack workspace outside if the IBM slack service.

slack1

  1. After authentication, you will see the following screen:

slack2

  1. Click Create a Workspace ->

  2. Name your Slack workspace

slack3

Give your workspace a unique name such as aiops-<yourname>.

  1. Describe the workspace current purpose

slack4

This is free text, you may simply write โ€œdemo for Watson AIOpsโ€ or whatever you like.

slack5

You may add team members to your new Slack workspace or skip this step.

At this point you have created your own Slack workspace where you are the administrator and can perform all the necessary steps to integrate with CP4WAOps.

slack6

Note : This Slack workspace is outside the control of IBM and must be treated as a completely public environment. Do not place any confidential material in this Slack workspace.

4.2 Create Your Slack App

  1. Create a Slack app, by going to https://api.slack.com/apps and clicking Create New App.

    slack7

  2. Select From an app manifest

slack7

  1. Select the appropriate workspace that you have created before and click Next

  2. Copy and paste the content of this file ./doc/slack/slack-app-manifest.yaml.

    Don't bother with the URLs just yet, we will adapt them as needed.

  3. Click Next

  4. Click Create

  5. Scroll down to Display Information and name your CP4WAIOPS app.

  6. You can add an icon to the app (there are some sample icons in the ./tools/4_integrations/slack/icons folder.

  7. Click save changes

  8. In the Basic Information menu click on Install to Workspace then click Allow

4.3 Create Your Slack Channels

  1. In Slack add a two new channels:

    • aiops-demo-reactive
    • aiops-demo-proactive

    slack7

  2. Right click on each channel and select Copy Link

    This should get you something like this https://xxxx.slack.com/archives/C021QOY16BW The last part of the URL is the channel ID (i.e. C021QOY16BW) Jot them down for both channels

  3. Under Apps click Browse Apps

    slack7

  4. Select the App you just have created

  5. Invite the Application to each of the two channels by typing

    @<MyAppname>
  6. Select Add to channel

    You shoud get a message from saying was added to #<your-channel> by ...

4.4 Integrate Your Slack App

In the Slack App:

  1. In the Basic Information menu get the Signing Secret (not the Client Secret!) and jot it down

    K8s CNI

  2. In the OAuth & Permissions get the Bot User OAuth Token (not the User OAuth Token!) and jot it down

    K8s CNI

In the CP4WAIOps (CP4WAIOPS)

  1. In the CP4WAIOps "Hamburger" Menu select Define/Data and tool integrations

  2. Click Add connection

    K8s CNI

  3. Under Slack, click on Add Connection K8s CNI

  4. Name it "Slack"

  5. Paste the Signing Secret from above

  6. Paste the Bot User OAuth Token from above

    K8s CNI

  7. Paste the channel IDs from the channel creation step in the respective fields

    K8s CNI

    K8s CNI

  8. Test the connection and click save

4.5 Create the Integration URL

In the CP4WAIOps (CP4WAIOPS)

  1. Go to Data and tool integrations

  2. Under Slack click on 1 integration

  3. Copy out the URL

    secure_gw_search

This is the URL you will be using for step 6.

4.6 Create Slack App Communications

Return to the browser tab for the Slack app.

4.6.1 Event Subscriptions

  1. Select Event Subscriptions.

  2. In the Enable Events section, click the slider to enable events.

  3. For the Request URL field use the Request URL from step 5.

    e.g: https://<my-url>/aiops/aimanager/instances/xxxxx/api/slack/events

  4. After pasting the value in the field, a Verified message should display.

    slacki3

    If you get an error please check 5.7

  5. Verify that on the Subscribe to bot events section you got:

    • app_mention and
    • member_joined_channel events.

    slacki4

  6. Click Save Changes button.

4.6.2 Interactivity & Shortcuts

  1. Select Interactivity & Shortcuts.

  2. In the Interactivity section, click the slider to enable interactivity. For the Request URL field, use use the URL from above.

There is no automatic verification for this form

slacki5

  1. Click Save Changes button.

4.6.3 Slash Commands

Now, configure the welcome slash command. With this command, you can trigger the welcome message again if you closed it.

  1. Select Slash Commands

  2. Click Create New Command to create a new slash command.

    Use the following values:

    Field Value
    Command /welcome
    Request URL the URL from above
    Short Description Welcome to Watson AIOps
  3. Click Save.

4.6.4 Reinstall App

The Slack app must be reinstalled, as several permissions have changed.

  1. Select Install App
  2. Click Reinstall to Workspace

Once the workspace request is approved, the Slack integration is complete.

If you run into problems validating the Event Subscription in the Slack Application, see 5.2

4.7 Slack Reset

4.7.1 Get the User OAUTH Token

This is needed for the reset scripts in order to empty/reset the Slack channels.

This is based on Slack Cleaner2. You might have to install this:

pip3 install slack-cleaner2

Reset reactive channel

In your Slack app

  1. In the OAuth & Permissions get the User OAuth Token (not the Bot User OAuth Token this time!) and jot it down

In file ./tools/98_maintenance/scripts/13_reset-slack.sh

  1. Replace not_configured for the SLACK_TOKEN parameter with the token
  2. Adapt the channel name for the SLACK_REACTIVE parameter

Reset proactive channel

In your Slack app

  1. In the OAuth & Permissions get the User OAuth Token (not the Bot User OAuth Token this time!) and jot it down (same token as above)

In file ./tools/98_maintenance/scripts/14_reset-slack-changerisk.sh

  1. Replace not_configured for the SLACK_TOKEN parameter with the token
  2. Adapt the channel name for the SLACK_PROACTIVE parameter

4.7.2 Perform Slack Reset

Call either of the scripts above to reset the channel:

./tools/98_maintenance/scripts/13_reset-slack.sh

or

./tools/98_maintenance/scripts/14_reset-slack-changerisk.sh

5. Demo Setup - Explained


demo

๐Ÿ“ฅ Basic Architecture

5.1 Basic Architecture

The environement (Kubernetes, Applications, ...) create logs that are being fed into a Log Management Tool (ELK in this case).

demo

  1. External Systems generate Alerts and send them into the CP4WAIOps for Event Grouping.
  2. At the same time CP4WAIOps ingests the raw logs coming from the Log Management Tool (ELK) and looks for anomalies in the stream based on the trained model.
  3. It also ingests Metric Data and looks for anomalies
  4. If it finds an anomaly (logs and/or metrics) it forwards it to the Event Grouping as well.
  5. Out of this, CP4WAIOps creates a Story that is being enriched with Topology (Localization and Blast Radius) and with Similar Incidents that might help correct the problem.
  6. The Story is then sent to Slack.
  7. A Runbook is available to correct the problem but not launched automatically.
๐Ÿ“ฅ Optimized Demo Architecture

5.2 Optimized Demo Architecture

The idea of this repo is to provide a optimised, complete, pre-trained demo environment that is self-contained (e.g. can be deployed in only one cluster)

It contains the following components (which can be installed independently):

  • CP4WAIOps
    • IBM Operator
    • CP4WAIOps Instance
  • CP4WAIOps Demo Content (optional)
    • OpenLDAP & Register with CP4WAIOps
    • AWX (Open Source Ansible Tower) with preloaded Playbooks
    • AI Models - Load and Train
      • Create Training Definitions (TG, LAD, CR, SI. Turn off RSA)
      • Create Training Data (LAD, SNOW)
      • Train Models (TG, LAD, CR, SI)
    • Topology
      • RobotShop Demo App
      • Create K8s Observer
      • Create ASM merge rules
      • Load Overlay Topology
      • Create CP4WAIOps Application
    • Misc - Creates valid certificate for Ingress (Slack) - External Routes (Flink, Topology, ...) - Disables ASM Service match rule - Create Policy Creation for Stories and Runbooks - Demo Service Account
  • Event Manager (optional)
    • Event Manager
  • Event Manager Demo Content (optional)
    • Topology
      • Create ASM merge rules
      • Load ASM merge Topology
      • Create CP4WAIOps Application
  • Turbonomic (optional)
  • Turbonomic Demo Content (optional)
    • Demo User
    • RobotShop Demo App with synthetic metric
    • Instana target (if Instana is installed - you have to enter the API Token Manually)
    • Groups for vCenter and RobotShop
    • Groups for licensing
    • Resource Hogs
  • Instana (optional)

demo

For the this specific Demo environment:

  • ELK is not needed as I am using pre-canned logs for training and for the anomaly detection (inception)
  • Same goes for Metrics, I am using pre-canned metric data for training and for the anomaly detection (inception)
  • The Events are also created from pre-canned content that is injected into CP4WAIOps
  • There are also pre-canned ServiceNow Incidents if you donโ€™t want to do the live integration with SNOW
  • The Webpages that are reachable from the Events are static and hosted on my GitHub
  • The same goes for ServiceNow Incident pages if you donโ€™t integrate with live SNOW

This allows us to:

  • Install the whole Demo Environment in a self-contained OCP Cluster
  • Trigger the Anomalies reliably
  • Get Events from sources that would normally not be available (Instana, Turbonomic, Log Aggregator, Metric Provider, ...)
  • Show some examples of SNOW integration without a live system
๐Ÿ“ฅ Training

5.3 Training

5.3.1 Loading training data

demo

Loading Training data is done at the lowest possible level (for efficiency and speed):

  • Logs: Loading Elastic Search indexes directly into ES - two days of logs for March 3rd and 4th 2022
  • SNOW: Loading Elastic Search indexes directly into ES - synthetic data with 15k change requests and 5k incidents
  • Metrics: Loading Cassandra dumps of metric data - 3 months of synthetic data for 13 KPIs

5.3.2 Training the models

The models can be trained directly on the data that has been loaded as described above.

๐Ÿ“ฅ Incident creation

5.4 Incident creation (inception)

demo

Incidents are being created by using the high level APIs in order to simulate a real-world scenario.

  • Events: Pre-canned events are being injected through the corresponding REST API
  • Logs: Pre-canned anomalous logs for a 30 min timerange are injected through Kafka
  • Metrics: Anomalous metric data are generated on the fly and injected via the corresponding REST API

About


Languages

Language:Shell 37.9%Language:CSS 33.9%Language:Jinja 16.3%Language:Python 6.2%Language:HTML 4.1%Language:JavaScript 1.1%Language:Dockerfile 0.3%Language:EJS 0.3%