EaseProbe is a simple, standalone, and lightWeight tool that can do health/status checking, written in Go.
EaseProbe would do 3 kinds of work - Probe, Notify, and Report.
Ease Probe supports the following probing methods: HTTP, TCP, Shell Command, SSH Command, Host Resource Usage, and Native Client.
-
HTTP. Checking the HTTP status code, Support mTLS, HTTP Basic Auth, and can set the Request Header/Body. ( HTTP Probe Configuration )
http: # Some of the Software support the HTTP Query - name: ElasticSearch url: http://elasticsearch.server:9200 - name: Prometheus url: http://prometheus:9090/graph
-
TCP. Just simply check whether the TCP connection can be established or not. ( TCP Probe Configuration )
tcp: - name: Kafka host: kafka.server:9093
-
Shell. Run a Shell command and check the result. ( Shell Command Probe Configuration )
shell: # run redis-cli ping and check the "PONG" - name: Redis (Local) cmd: "redis-cli" args: - "-h" - "127.0.0.1" - "ping" env: # set the `REDISCLI_AUTH` environment variable for redis password - "REDISCLI_AUTH=abc123" # check the command output, if does not contain the PONG, mark the status down contain : "PONG"
-
SSH. Run a remote command via SSH and check the result. Support the bastion/jump server (SSH Command Probe Configuration)
ssh: servers: - name : ServerX host: ubuntu@172.10.1.1:22 password: xxxxxxx key: /Users/user/.ssh/id_rsa cmd: "ps auxwe | grep easeprobe | grep -v grep" contain: easeprobe
-
Host. Run a SSH command on remote host and check the CPU, Memory, and Disk usage. ( Host Load Probe )
host: servers: - name : server host: ubuntu@172.20.2.202:22 key: /path/to/server.pem threshold: cpu: 0.80 # cpu usage 80% mem: 0.70 # memory usage 70% disk: 0.90 # disk usage 90%
-
Client. Currently, support the following native client. Support the mTLS. ( Native Client Probe )
- MySQL. Connect to the MySQL server and run the
SHOW STATUS
SQL. - Redis. Connect to the Redis server and run the
PING
command. - MongoDB. Connect to MongoDB server and just ping server.
- Kafka. Connect to Kafka server and list all topics.
- PostgreSQL. Connect to PostgreSQL server and run
SELECT 1
SQL. - Zookeeper. Connect to Zookeeper server and run
get /
command.
client: - name: Kafka Native Client (local) driver: "kafka" host: "localhost:9093" # mTLS ca: /path/to/file.ca cert: /path/to/file.crt key: /path/to/file.key
- MySQL. Connect to the MySQL server and run the
Ease Probe supports the following notifications:
- Slack. Using Webhook for notification
- Discord. Using Webhook for notification
- Telegram. Using Telegram Bot for notification
- Email. Support multiple email addresses.
- AWS SNS. Support AWS Simple Notification Service.
- WeChat Work. Support Enterprise WeChat Work notification.
- DingTalk. Support the DingTalk notification.
- Lark. Support the Lark(Feishu) notification.
- Log File. Write the notification into a log file
Note:
- The notification is Edge-Triggered Mode, only notified while the status is changed.
# Notification Configuration
notify:
slack:
- name: "MegaEase#Alert"
webhook: "https://hooks.slack.com/services/........../....../....../"
discord:
- name: "MegaEase#Alert"
webhook: "https://discord.com/api/webhooks/...../....../"
telegram:
- name: "MegaEase Alert Group"
token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
chat_id: -123456789 # Channel / Group ID
email:
- name: "DevOps Mailing List"
server: smtp.email.example.com:465
username: user@example.com
password: ********
to: "user1@example.com;user2@example.com"
aws_sns:
- name: AWS SNS
region: us-west-2
arn: arn:aws:sns:us-west-2:298305261856:xxxxx
endpoint: https://sns.us-west-2.amazonaws.com
credential:
id: AWSXXXXXXXID
key: XXXXXXXX/YYYYYYY
wecom:
- name: "wecom alert service"
webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=589f9674-a2aa-xxxxxxxx-16bb6c43034a" # wecom robot webhook
dingtalk:
- name: "dingtalk alert service"
webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxx"
lark:
- name: "lark alert service"
webhook: "https://open.feishu.cn/open-apis/bot/v2/hook/d5366199-xxxx-xxxx-bd81-a57d1dd95de4"
Check the Notification Configuration to see how to configure it.
-
SLA Report Notify. EaseProbe would send the daily, weekly, or monthly SLA report.
settings: # SLA Report schedule sla: # daily, weekly (Sunday), monthly (Last Day), none schedule: "weekly" # UTC time, the format is 'hour:min:sec' time: "23:59"
-
SLA Live Report. You can query the SLA Live Report
The EaseProbe would listen on 0.0.0.0:8181
port by default. And you can access the Live SLA report by the following URL:
- HTML:
http://localhost:8181/
- JSON:
http://localhost:8181/api/v1/sla/
For more information, please check the Global Setting Configuration
Compiler Go 1.18+
(Generics Programming Support)
Use make
to make the binary file. the target is under the build/bin
directory
$ make
Running the following command for the local test
$ build/bin/easeprobe -f config.yaml
The following configuration is an example.
Notes: All probes have the following options:
timeout
- the maximum time to wait for the probe to complete. default :30s
.interval
- the interval time to run the probe. default:1m
.
# HTTP Probe Configuration
http:
# A Website
- name: MegaEase Website (Global)
url: https://megaease.com
# Some of the Software support the HTTP Query
- name: ElasticSearch
url: http://elasticsearch.server:9200
- name: Eureka
url: http://eureka.server:8761
- name: Prometheus
url: http://prometheus:9090/graph
# Spring Boot Application with Actuator Heath API
- name: EaseService-Governance
url: http://easeservice-mgmt-governance:38012/actuator/health
- name: EaseService-Control
url: http://easeservice-mgmt-control:38013/actuator/health
- name: EaseService-Mesh
url: http://easeservice-mgmt-mesh:38013/actuator/health
# A completed HTTP Probe configuration
- name: Special Website
url: https://megaease.cn
# Request Method
method: GET
# Request Header
headers:
X-head-one: xxxxxx
X-head-two: yyyyyy
X-head-THREE: zzzzzzX-
content_encoding: text/json
# Request Body
body: '{ "FirstName": "Mega", "LastName" : "Ease", "UserName" : "megaease", "Email" : "user@example.com"}'
# HTTP Basic Auth
username: username
password: password
# mTLS
ca: /path/to/file.ca
cert: /path/to/file.crt
key: /path/to/file.key
# configuration
timeout: 10s # default is 30 seconds
# TCP Probe Configuration
tcp:
- name: SSH Service
host: example.com:22
timeout: 10s # default is 30 seconds
interval: 2m # default is 60 seconds
- name: Kafka
host: kafka.server:9093
The shell command probe is used to execute a shell command and check the output.
The following example shows how to configure the shell command probe.
# Shell Probe Configuration
shell:
# A proxy curl shell script
- name: Google Service
cmd: "./resources/probe/scripts/proxy.curl.sh"
args:
- "socks5://127.0.0.1:1085"
- "www.google.com"
# run redis-cli ping and check the "PONG"
- name: Redis (Local)
cmd: "redis-cli"
args:
- "-h"
- "127.0.0.1"
- "ping"
env:
# set the `REDISCLI_AUTH` environment variable for redis password
- "REDISCLI_AUTH=abc123"
# check the command output, if does not contain the PONG, mark the status down
contain : "PONG"
# Run Zookeeper command `stat` to check the zookeeper status
- name: Zookeeper (Local)
cmd: "/bin/sh"
args:
- "-c"
- "echo stat | nc 127.0.0.1 2181"
contain: "Mode:"
SSH probe is similar to Shell probe.
- Support Password and Private key authentication.
- Support the Bastion host tunnel.
The host
supports the following configuration
example.com
example.com:22
user@example.com:22
The following are example of SSH probe configuration.
# SSH Probe Configuration
ssh:
# SSH bastion host configuration
bastion:
aws: # bastion host ID ◄──────────────────────────────┐
host: aws.basition.com:22 # │
username: ubuntu # login user │
key: /patch/to/aws/basion/key.pem # private key file │
gcp: # bastion host ID │
host: ubuntu@gcp.basition.com:22 # bastion host │
key: /patch/to/gcp/basion/key.pem # private key file │
# SSH Probe configuration │
servers: # │
# run redis-cli ping and check the "PONG" │
- name: Redis (AWS) # Name │
bastion: aws # bastion host id ------------------------─┘
host: 172.20.2.202:22
username: ubuntu # SSH Login username
password: xxxxx # SSH Login password
key: /path/to/private.key # SSH login private file
cmd: "redis-cli"
args:
- "-h"
- "127.0.0.1"
- "ping"
env:
# set the `REDISCLI_AUTH` environment variable for redis password
- "REDISCLI_AUTH=abc123"
# check the command output, if does not contain the PONG, mark the status down
contain : "PONG"
# Check the process status of `Kafka`
- name: Kafka (GCP)
bastion: gcp # ◄------ bastion host id
host: 172.10.1.100:22
username: ubuntu
key: /path/to/private.key
cmd: "ps -ef | grep kafka"
Support the host probe, the configuration example as below.
The feature probe the CPU, Memory, and Disk usage, if one of them exceeds the threshold, then mark the host as status down.
Note:
- The thresholds are OR condition, if one of them exceeds the threshold, then mark the host as status down.
- The Host needs remote server have the following command:
top
,df
,free
,awk
,grep
,tr
, andhostname
(check the source code to see how it works).- The disk usage only check the root disk.
host:
bastion: # bastion server configuration
aws: # bastion host ID ◄──────────────────┐
host: ubuntu@example.com # bastion host │
key: /path/to/bastion.pem # private key file │
# Servers List │
servers: # │
- name : aws server # │
bastion: aws # <-- bastion server id ------─┘
host: ubuntu@172.20.2.202:22
key: /path/to/server.pem
threshold:
cpu: 0.80 # cpu usage 80%
mem: 0.70 # memory usage 70%
disk: 0.90 # disk usage 90%
# Using the default threshold
# cpu 80%, mem 80% and disk 95%
- name : My VPS
host: user@example.com:22
key: /Users/user/.ssh/id_rsa
# Native Client Probe
client:
- name: Redis Native Client (local)
driver: "redis" # driver is redis
host: "localhost:6379" # server and port
password: "abc123" # password
# mTLS
ca: /path/to/file.ca
cert: /path/to/file.crt
key: /path/to/file.key
- name: MySQL Native Client (local)
driver: "mysql"
host: "localhost:3306"
username: "root"
password: "pass"
- name: MongoDB Native Client (local)
driver: "mongo"
host: "localhost:27017"
username: "admin"
password: "abc123"
timeout: 5s
- name: Kafka Native Client (local)
driver: "kafka"
host: "localhost:9093"
# mTLS
ca: /path/to/file.ca
cert: /path/to/file.crt
key: /path/to/file.key
- name: PostgreSQL Native Client (local)
driver: "postgres"
host: "localhost:5432"
username: "postgres"
password: "pass"
- name: Zookeeper Native Client (local)
driver: "zookeeper"
host: "localhost:2181"
timeout: 5s
# mTLS
ca: /path/to/file.ca
cert: /path/to/file.crt
key: /path/to/file.key
# Notification Configuration
notify:
# Notify to Slack Channel
slack:
- name: "Organization #Alert"
webhook: "https://hooks.slack.com/services/........../....../....../"
# dry: true # dry notification, print the Slack JSON in log(STDOUT)
telegram:
- name: "Group Name"
token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
chat_id: -123456789 # Group ID
- name: "Channel Name"
token: 1234567890:ABCDEFGHIJKLMNOPQRSTUVWXYZ # Bot Token
chat_id: -1001234567890 # Channel ID
# Notify to Discord Text Channel
discord:
- name: "Server #Alert"
webhook: "https://discord.com/api/webhooks/...../....../"
# the avatar and thumbnail setting for notify block
avatar: "https://img.icons8.com/ios/72/appointment-reminders--v1.png"
thumbnail: "https://freeiconshop.com/wp-content/uploads/edd/notification-flat.png"
# dry: true # dry notification, print the Discord JSON in log(STDOUT)
retry: # something the network is not good need to retry.
times: 3
interval: 10s
# Notify to email addresses
email:
- name: "XXX Mail List"
server: smtp.email.example.com:465
username: user@example.com
password: ********
to: "user1@example.com;user2@example.com"
# dry: true # dry notification, print the Email HTML in log(STDOUT)
# Notify to AWS Simple Notification Service
aws_sns:
- name: AWS SNS
region: us-west-2 # AWS Region
arn: arn:aws:sns:us-west-2:298305261856:xxxxx # SNS ARN
endpoint: https://sns.us-west-2.amazonaws.com # SNS Endpoint
credential: # AWS Access Credential
id: AWSXXXXXXXID # AWS Access Key ID
key: XXXXXXXX/YYYYYYY # AWS Access Key Secret
# Notify to Wecom(WeChatwork) robot.
wecom:
- name: "wecom alert service"
webhook: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=589f9674-a2aa-xxxxxxxx-16bb6c43034a" # wecom robot webhook
# Notify to Dingtalk
dingtalk:
- name: "dingtalk alert service"
webhook: "https://oapi.dingtalk.com/robot/send?access_token=xxxx"
# Notify to Lark
lark:
- name: "lark alert service"
webhook: "https://open.feishu.cn/open-apis/bot/v2/hook/d5366199-xxxx-xxxx-bd81-a57d1dd95de4"
# Notify to a local log file
log:
- name: "Local Log"
file: "/tmp/easeprobe.log"
dry: true
Notes: All of the notifications can have the following optional configuration.
dry: true # dry notification, print the Discord JSON in log(STDOUT)
timeout: 20s # the timeout send out notification, default: 30s
retry: # somehow the network is not good needs to retry.
times: 3 # default: 3
interval: 10s # default: 5s
# Global settings for all probes and notifiers.
settings:
# A HTTP Server configuration
http:
ip: 127.0.0.1 # the IP address of the server. default:"0.0.0.0"
port: 8181 # the port of the server. default: 8181
refresh: 5s # the auto-refresh interval of the server. default: the minimum value of the probes' interval.
# SLA Report schedule
sla:
# daily, weekly (Sunday), monthly (Last Day), none
schedule : "daily"
# UTC time, the format is 'hour:min:sec'
time: "23:59"
# debug mode
# - true: send the SLA report every minute
# - false: send the SLA report in schedule
debug: false
notify:
# dry: true # Global settings for dry run
retry: # Global settings for retry
times: 5
interval: 10s
probe:
timeout: 30s # the time out for all probes
interval: 1m # probe every minute for all probes
# easeprobe program running log file.
logfile: "test.log"
# Log Level Configuration
# can be: panic, fatal, error, warn, info, debug.
loglevel: "debug"
# Date format
# Date
# - January 2, 2006
# - 01/02/06
# - Jan-02-06
#
# Time
# - 15:04:05
# - 3:04:05 PM
#
# Date Time
# - Jan _2 15:04:05 (Timestamp)
# - Jan _2 15:04:05.000000 (with microseconds)
# - 2006-01-02T15:04:05-0700 (ISO 8601 (RFC 3339))
# - 2006-01-02 15:04:05
# - 02 Jan 06 15:04 MST (RFC 822)
# - 02 Jan 06 15:04 -0700 (with numeric zone)
# - Mon, 02 Jan 2006 15:04:05 MST (RFC 1123)
# - Mon, 02 Jan 2006 15:04:05 -0700 (with numeric zone)
timeformat: "2006-01-02 15:04:05 UTC"
- Join Slack Workspace for requirement, issue and development.
- MegaEase on Twitter
EaseProbe is under the Apache 2.0 license. See the LICENSE file for details.