gelleson / easeprobe

A simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EaseProbe

EaseProbe is a simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

Table of Contents

1. Overview

EaseProbe would do 3 kinds of work - Probe, Notify, and Report.

1.1 Probe

Ease Probe supports the following probing methods: HTTP, TCP, Shell Command, SSH Command, Host Resource Usage, and Native Client.

  • HTTP. Checking the HTTP status code, Support mTLS, HTTP Basic Auth, and can set the Request Header/Body. ( HTTP Probe Configuration )

    http:
      # Some of the Software support the HTTP Query
      - name: ElasticSearch
        url: http://elasticsearch.server:9200
      - name: Prometheus
        url: http://prometheus:9090/graph
  • TCP. Just simply check whether the TCP connection can be established or not. ( TCP Probe Configuration )

    tcp:
      - name: Kafka
        host: kafka.server:9093
  • Shell. Run a Shell command and check the result. ( Shell Command Probe Configuration )

    shell:
      # run redis-cli ping and check the "PONG"
      - name: Redis (Local)
        cmd: "redis-cli"
        args:
          - "-h"
          - "127.0.0.1"
          - "ping"
        env:
          # set the `REDISCLI_AUTH` environment variable for redis password
          - "REDISCLI_AUTH=abc123"
        # check the command output, if does not contain the PONG, mark the status down
        contain : "PONG"
  • SSH. Run a remote command via SSH and check the result. Support the bastion/jump server (SSH Command Probe Configuration)

    ssh:
      servers:
        - name : ServerX
          host: ubuntu@172.10.1.1:22
          password: xxxxxxx
          key: /Users/user/.ssh/id_rsa
          cmd: "ps auxwe | grep easeprobe | grep -v grep"
          contain: easeprobe
  • Host. Run a SSH command on remote host and check the CPU, Memory, and Disk usage. ( Host Load Probe )

    host:
      servers:
        - name : server
          host: ubuntu@172.20.2.202:22
          key: /path/to/server.pem
          threshold:
            cpu: 0.80  # cpu usage  80%
            mem: 0.70  # memory usage 70%
            disk: 0.90  # disk usage 90%
  • Client. Currently, support the following native client. Support the mTLS. ( Native Client Probe )

    • MySQL. Connect to the MySQL server and run the SHOW STATUS SQL.
    • Redis. Connect to the Redis server and run the PING command.
    • MongoDB. Connect to MongoDB server and just ping server.
    • Kafka. Connect to Kafka server and list all topics.
    • PostgreSQL. Connect to PostgreSQL server and run SELECT 1 SQL.
    • Zookeeper. Connect to Zookeeper server and run get / command.
    client:
      - name: Kafka Native Client (local)
        driver: "kafka"
        host: "localhost:9093"
        # mTLS
        ca: /path/to/file.ca
        cert: /path/to/file.crt
        key: /path/to/file.key

1.2 Notification

Ease Probe supports the following notifications:

  • Slack. Using Webhook for notification
  • Discord. Using Webhook for notification
  • Telegram. Using Telegram Bot for notification
  • Email. Support multiple email addresses.
  • AWS SNS. Support AWS Simple Notification Service.
  • WeChat Work. Support Enterprise WeChat Work notification.
  • DingTalk. Support the DingTalk notification.
  • Lark. Support the Lark(Feishu) notification.
  • Log File. Write the notification into a log file

Note:

  • The notification is Edge-Triggered Mode, only notified while the status is changed.
# Notification Configuration
notify:
  slack:
    - name: "MegaEase#Alert"
      webhook: "https://hooks.slack.com/services/........../....../....../"
  discord:
    - name: "MegaEase#Alert"
      webhook: "https://discord.com/api/webhooks/...../....../"
  telegram:
    - name: "MegaEase Alert Group"
      token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
      chat_id: -123456789 # Channel / Group ID
  email:
    - name: "DevOps Mailing List"
      server: smtp.email.example.com:465
      username: user@example.com
      password: ********
      to: "user1@example.com;user2@example.com"
  aws_sns:
    - name: AWS SNS
      region: us-west-2
      arn: arn:aws:sns:us-west-2:298305261856:xxxxx
      endpoint: https://sns.us-west-2.amazonaws.com
      credential:
        id: AWSXXXXXXXID
        key: XXXXXXXX/YYYYYYY
  wecom:
    - name: "wecom alert service"
      webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=589f9674-a2aa-xxxxxxxx-16bb6c43034a" # wecom robot webhook
  dingtalk:
    - name: "dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxx"
  lark:
    - name: "lark alert service"
      webhook: "https://open.feishu.cn/open-apis/bot/v2/hook/d5366199-xxxx-xxxx-bd81-a57d1dd95de4"

Check the Notification Configuration to see how to configure it.

1.3 Report

  • SLA Report Notify. EaseProbe would send the daily, weekly, or monthly SLA report.

    settings:
      # SLA Report schedule
      sla:
        #  daily, weekly (Sunday), monthly (Last Day), none
        schedule: "weekly"
        # UTC time, the format is 'hour:min:sec'
        time: "23:59"
  • SLA Live Report. You can query the SLA Live Report

The EaseProbe would listen on 0.0.0.0:8181 port by default. And you can access the Live SLA report by the following URL:

  • HTML: http://localhost:8181/
  • JSON: http://localhost:8181/api/v1/sla/

For more information, please check the Global Setting Configuration

2. Getting Start

2.1 Build

Compiler Go 1.18+ (Generics Programming Support)

Use make to make the binary file. the target is under the build/bin directory

$ make

2.2 Run

Running the following command for the local test

$ build/bin/easeprobe -f config.yaml

3. Configuration

The following configuration is an example.

Notes: All probes have the following options:

  • timeout - the maximum time to wait for the probe to complete. default : 30s.
  • interval - the interval time to run the probe. default: 1m.

3.1 HTTP Probe Configuration

# HTTP Probe Configuration

http:
  # A Website
  - name: MegaEase Website (Global)
    url: https://megaease.com

  # Some of the Software support the HTTP Query
  - name: ElasticSearch
    url: http://elasticsearch.server:9200
  - name: Eureka
    url: http://eureka.server:8761
  - name: Prometheus
    url: http://prometheus:9090/graph

  # Spring Boot Application with Actuator Heath API
  - name: EaseService-Governance
    url: http://easeservice-mgmt-governance:38012/actuator/health
  - name: EaseService-Control
    url: http://easeservice-mgmt-control:38013/actuator/health
  - name: EaseService-Mesh
    url: http://easeservice-mgmt-mesh:38013/actuator/health

  # A completed HTTP Probe configuration
  - name: Special Website
    url: https://megaease.cn
    # Request Method
    method: GET
    # Request Header
    headers:
      X-head-one: xxxxxx
      X-head-two: yyyyyy
      X-head-THREE: zzzzzzX-
    content_encoding: text/json
    # Request Body
    body: '{ "FirstName": "Mega", "LastName" : "Ease", "UserName" : "megaease", "Email" : "user@example.com"}'
    # HTTP Basic Auth
    username: username
    password: password
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key
    # configuration
    timeout: 10s # default is 30 seconds

3.2 TCP Probe Configuration

# TCP Probe Configuration
tcp:
  - name: SSH Service
    host: example.com:22
    timeout: 10s # default is 30 seconds
    interval: 2m # default is 60 seconds

  - name: Kafka
    host: kafka.server:9093

3.3 Shell Command Probe Configuration

The shell command probe is used to execute a shell command and check the output.

The following example shows how to configure the shell command probe.

# Shell Probe Configuration
shell:
  # A proxy curl shell script
  - name: Google Service
    cmd: "./resources/probe/scripts/proxy.curl.sh"
    args:
      - "socks5://127.0.0.1:1085"
      - "www.google.com"

  # run redis-cli ping and check the "PONG"
  - name: Redis (Local)
    cmd: "redis-cli"
    args:
      - "-h"
      - "127.0.0.1"
      - "ping"
    env:
      # set the `REDISCLI_AUTH` environment variable for redis password
      - "REDISCLI_AUTH=abc123"
    # check the command output, if does not contain the PONG, mark the status down
    contain : "PONG"

  # Run Zookeeper command `stat` to check the zookeeper status
  - name: Zookeeper (Local)
    cmd: "/bin/sh"
    args:
      - "-c"
      - "echo stat | nc 127.0.0.1 2181"
    contain: "Mode:"

3.4 SSH Command Probe Configuration

SSH probe is similar to Shell probe.

  • Support Password and Private key authentication.
  • Support the Bastion host tunnel.

The host supports the following configuration

  • example.com
  • example.com:22
  • user@example.com:22

The following are example of SSH probe configuration.

# SSH Probe Configuration
ssh:
  # SSH bastion host configuration
  bastion:
    aws: # bastion host ID      ◄──────────────────────────────┐
      host: aws.basition.com:22 #
      username: ubuntu # login user                            │
      key: /patch/to/aws/basion/key.pem # private key file     │
    gcp: # bastion host ID                                     │
      host: ubuntu@gcp.basition.com:22 # bastion host          │
      key: /patch/to/gcp/basion/key.pem # private key file     │
  # SSH Probe configuration                                    │
  servers:   #
    # run redis-cli ping and check the "PONG"                  │
    - name: Redis (AWS) # Name                                 │
      bastion: aws  # bastion host id ------------------------─┘
      host: 172.20.2.202:22
      username: ubuntu  # SSH Login username
      password: xxxxx   # SSH Login password
      key: /path/to/private.key # SSH login private file
      cmd: "redis-cli"
      args:
        - "-h"
        - "127.0.0.1"
        - "ping"
      env:
        # set the `REDISCLI_AUTH` environment variable for redis password
        - "REDISCLI_AUTH=abc123"
      # check the command output, if does not contain the PONG, mark the status down
      contain : "PONG"
    
    # Check the process status of `Kafka`
    - name:  Kafka (GCP)
      bastion: gcp         #  ◄------ bastion host id
      host: 172.10.1.100:22
      username: ubuntu
      key: /path/to/private.key
      cmd: "ps -ef | grep kafka"

3.5 Host Resource Usage Probe Configuration

Support the host probe, the configuration example as below.

The feature probe the CPU, Memory, and Disk usage, if one of them exceeds the threshold, then mark the host as status down.

Note:

  • The thresholds are OR condition, if one of them exceeds the threshold, then mark the host as status down.
  • The Host needs remote server have the following command: top, df, free, awk, grep, tr, and hostname (check the source code to see how it works).
  • The disk usage only check the root disk.
host:
  bastion: # bastion server configuration
    aws: # bastion host ID      ◄──────────────────┐
      host: ubuntu@example.com # bastion host      │
      key: /path/to/bastion.pem # private key file │
  # Servers List                                   │
  servers: #
    - name : aws server   #
      bastion: aws #  <-- bastion server id ------─┘
      host: ubuntu@172.20.2.202:22
      key: /path/to/server.pem
      threshold:
        cpu: 0.80  # cpu usage  80%
        mem: 0.70  # memory usage 70%
        disk: 0.90  # disk usage 90%

    # Using the default threshold 
    # cpu 80%, mem 80% and disk 95%
    - name : My VPS
      host: user@example.com:22
      key: /Users/user/.ssh/id_rsa

3.6 Native Client Probe

# Native Client Probe
client:
  - name: Redis Native Client (local)
    driver: "redis"  # driver is redis
    host: "localhost:6379"  # server and port
    password: "abc123" # password
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key

  - name: MySQL Native Client (local)
    driver: "mysql"
    host: "localhost:3306"
    username: "root"
    password: "pass"

  - name: MongoDB Native Client (local)
    driver: "mongo"
    host: "localhost:27017"
    username: "admin"
    password: "abc123"
    timeout: 5s

  - name: Kafka Native Client (local)
    driver: "kafka"
    host: "localhost:9093"
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key

  - name: PostgreSQL Native Client (local)
    driver: "postgres"
    host: "localhost:5432"
    username: "postgres"
    password: "pass"

  - name: Zookeeper Native Client (local)
    driver: "zookeeper"
    host: "localhost:2181"
    timeout: 5s
    # mTLS
    ca: /path/to/file.ca
    cert: /path/to/file.crt
    key: /path/to/file.key

3.7 Notification Configuration

# Notification Configuration
notify:
  # Notify to Slack Channel
  slack:
    - name: "Organization #Alert"
      webhook: "https://hooks.slack.com/services/........../....../....../"
      # dry: true   # dry notification, print the Slack JSON in log(STDOUT)
  telegram:
    - name: "Group Name"
      token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
      chat_id: -123456789 # Group ID
    - name: "Channel Name"
      token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
      chat_id: -1001234567890 # Channel ID
  # Notify to Discord Text Channel
  discord:
    - name: "Server #Alert"
      webhook: "https://discord.com/api/webhooks/...../....../"
      # the avatar and thumbnail setting for notify block
      avatar: "https://img.icons8.com/ios/72/appointment-reminders--v1.png"
      thumbnail: "https://freeiconshop.com/wp-content/uploads/edd/notification-flat.png"
      # dry: true # dry notification, print the Discord JSON in log(STDOUT)
      retry: # something the network is not good need to retry.
        times: 3
        interval: 10s
  # Notify to email addresses
  email:
    - name: "XXX Mail List"
      server: smtp.email.example.com:465
      username: user@example.com
      password: ********
      to: "user1@example.com;user2@example.com"
      # dry: true # dry notification, print the Email HTML in log(STDOUT)
  # Notify to AWS Simple Notification Service
  aws_sns:
    - name: AWS SNS
      region: us-west-2 # AWS Region
      arn: arn:aws:sns:us-west-2:298305261856:xxxxx # SNS ARN
      endpoint: https://sns.us-west-2.amazonaws.com # SNS Endpoint
      credential: # AWS Access Credential
        id: AWSXXXXXXXID  # AWS Access Key ID
        key: XXXXXXXX/YYYYYYY # AWS Access Key Secret
  # Notify to Wecom(WeChatwork) robot.
  wecom:
    - name: "wecom alert service"
      webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=589f9674-a2aa-xxxxxxxx-16bb6c43034a" # wecom robot webhook
  # Notify to Dingtalk
  dingtalk:
    - name: "dingtalk alert service"
      webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxx"
  # Notify to Lark
  lark:
    - name: "lark alert service"
      webhook: "https://open.feishu.cn/open-apis/bot/v2/hook/d5366199-xxxx-xxxx-bd81-a57d1dd95de4"
  # Notify to a local log file
  log:
    - name: "Local Log"
      file: "/tmp/easeprobe.log"
      dry: true

Notes: All of the notifications can have the following optional configuration.

  dry: true # dry notification, print the Discord JSON in log(STDOUT)
  timeout: 20s # the timeout send out notification, default: 30s
  retry: # somehow the network is not good needs to retry.
    times: 3 # default: 3
    interval: 10s # default: 5s

3.8 Global Setting Configuration

# Global settings for all probes and notifiers.
settings:

  # A HTTP Server configuration
  http:
    ip: 127.0.0.1 # the IP address of the server. default:"0.0.0.0"
    port: 8181 # the port of the server. default: 8181
    refresh: 5s # the auto-refresh interval of the server. default: the minimum value of the probes' interval.

  # SLA Report schedule
  sla:
    #  daily, weekly (Sunday), monthly (Last Day), none
    schedule : "daily"
    # UTC time, the format is 'hour:min:sec'
    time: "23:59"
    # debug mode
    # - true: send the SLA report every minute
    # - false: send the SLA report in schedule
    debug: false

  notify:
    # dry: true # Global settings for dry run
    retry: # Global settings for retry
      times: 5
      interval: 10s

  probe:
    timeout: 30s # the time out for all probes
    interval: 1m # probe every minute for all probes
  # easeprobe program running log file.
  logfile: "test.log"

  # Log Level Configuration
  # can be: panic, fatal, error, warn, info, debug.
  loglevel: "debug"

  # Date format
  # Date
  #  - January 2, 2006
  #  - 01/02/06
  #  - Jan-02-06
  #
  # Time
  #   - 15:04:05
  #   - 3:04:05 PM
  #
  # Date Time
  #   - Jan _2 15:04:05                   (Timestamp)
  #   - Jan _2 15:04:05.000000            (with microseconds)
  #   - 2006-01-02T15:04:05-0700          (ISO 8601 (RFC 3339))
  #   - 2006-01-02 15:04:05
  #   - 02 Jan 06 15:04 MST               (RFC 822)
  #   - 02 Jan 06 15:04 -0700             (with numeric zone)
  #   - Mon, 02 Jan 2006 15:04:05 MST     (RFC 1123)
  #   - Mon, 02 Jan 2006 15:04:05 -0700   (with numeric zone)
  timeformat: "2006-01-02 15:04:05 UTC"

4. Community

5. License

EaseProbe is under the Apache 2.0 license. See the LICENSE file for details.

About

A simple, standalone, and lightWeight tool that can do health/status checking, written in Go.

License:Apache License 2.0


Languages

Language:Go 97.0%Language:Shell 1.8%Language:Makefile 0.7%Language:Dockerfile 0.5%