StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html

Home Page:https://stackstorm.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't disable st2timersengine

DesireWithin opened this issue · comments

SUMMARY

I followed the documentation(https://docs.stackstorm.com/reference/ha.html#blueprint-box) to install a highly available st2,
I can't disable st2timersengine after I add:

[timer]
enable = False

STACKSTORM VERSION

st2 3.8.0, on Python 3.6.9

OS, environment, install method

Ubuntu 18.04.6, install by apt.

Steps to reproduce the problem

add the configuration, and then restart st2:

root@prod-stackstorm-03:/etc/apt/sources.list.d# tail -n 10 /etc/st2/st2.conf
...
db_name = st2
username = stackstorm
password = XXXX
compressors = zstd

[coordination]
url = redis://:Redis_XXXXX@10.XX.XX.XXX:6379

[timer]
enable = False

root@prod-stackstorm-03:/etc/st2# st2ctl restart
Failed to stop st2chatops.service: Unit st2chatops.service not loaded.
Failed to start st2chatops.service: Unit st2chatops.service not found.
##### st2 components status #####
st2actionrunner PID: 102513
st2actionrunner PID: 102515
st2actionrunner PID: 102517
st2actionrunner PID: 102519
st2actionrunner PID: 102521
st2actionrunner PID: 102523
st2actionrunner PID: 102525
st2actionrunner PID: 102527
st2actionrunner PID: 102529
st2actionrunner PID: 102531
st2api PID: 102539
st2stream PID: 102549
st2auth PID: 102559
st2garbagecollector PID: 102562
st2notifier PID: 102565
st2rulesengine PID: 102569
st2sensorcontainer PID: 102572
st2chatops is not running.
st2timersengine PID: 102577
st2workflowengine PID: 102580
st2scheduler PID: 102583

Expected Results

I expect st2timersengine is not running.

Actual Results

Now I have duplicate rule evaluations.

What coordination backend are you using? I've observed this behaviour with HA setup and am using redis cluster as the coordination backend. Until a fix is found and released I'm using a workaround by putting a simple lock in the workflow that uses the st2 kv store. (This could be adapted to be an action that any workflow can call)

version: 1.0

vars:
  - check_lock_delay: 2

tasks:
  write_execution_id:
    action: st2.kv.set
    input:
      key: <% ctx(st2).action %>_exec_id
      value: <% ctx(st2).action_execution_id %>
    next:
      - when: <% succeeded() %>
        do: wait_to_check_lock

# Delay to allow all nodes to write to the kv store. (Adjust if nodes are heavily loaded and exceed delay)    
  wait_to_check_lock:
    action: core.local
    input:
      cmd: sleep <% ctx(check_lock_delay) %>
    next: 
      - when: <% succeeded() %>
        do: read_execution_id

  read_execution_id:
    action: st2.kv.get
    input:
      key: <% ctx(st2).action %>_exec_id
    next: 
      - when: <% succeeded() and result().result = ctx().st2.action_execution_id %>
        do: proceed

  proceed:
    action: core.local
    input: 
      cmd: echo "ONLY A SINGLE WORKFLOW SHOULD REACH HERE"

Yes, I'm using redis as a coordination backend. I am looking for a solution using haproxy to monitor st2timersengine progress.

I use keepalived to make sure only one st2timersengine is running.

MASTER config:

global_defs {
    # notification_email {
    #     your_email@example.com
    # }
    # notification_email_from keepalived@your_server.com
    # smtp_server localhost
    # smtp_connect_timeout 30
    router_id LVS_DEVEL
}

vrrp_script chk_program {
    script "/etc/keepalived/check_program.sh"
    interval 2
    weight -2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface ens4
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass Sts_platform
    }
    track_script {
        chk_program
    }
    notify_master "/etc/keepalived/start_program.sh"
    notify_backup "/etc/keepalived/stop_program.sh"
}

BACKUP config:

global_defs {
    # notification_email {
    #     your_email@example.com
    # }
    # notification_email_from keepalived@your_server.com
    # smtp_server localhost
    # smtp_connect_timeout 30
    router_id LVS_DEVEL
}

vrrp_script chk_program {
    script "/etc/keepalived/check_program.sh"
    interval 2
    weight -2
    fall 2
    rise 2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens4
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass Sts_platform
    }
    track_script {
        chk_program
    }
    notify_master "/etc/keepalived/start_program.sh"
    notify_backup "/etc/keepalived/stop_program.sh"
}

scripts:
check_program.sh

#!/bin/bash

status=$(systemctl status st2timersengine.service)

if [ $? -eq 0 ]; then
  echo "st2timersengine.service is running normally."
  exit 0
else
  echo "Error: st2timersengine.service is not running normally."
  exit 1
fi

start_program.sh

#!/bin/bash
systemctl restart st2timersengine.service

stop_program.sh

#!/bin/bash
systemctl stop st2timersengine.service