chaostoolkit / chaostoolkit-addons

Chaos Toolkit addons (tolerances, controls) that can benefit everyone

Home Page:https://chaostoolkit.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Race condition on safeguards

je-al opened this issue · comments

commented

Hey, seems like the safeguard control gets stuck in the after_experiment_control when exit_gracefully is called before reaching the call to wait on the now_all_done Barrier. I can reliably get the following experiment to hang forever unless I actually configure a pause for the probe, but I'm guessing there could be a more elegant solution:

---
title: safeguard test
description: safeguard test

controls:
- name: safeguard
  provider:
    type: python
    module: chaosaddons.controls.safeguards
    arguments:
      probes:
        - name: safeguard
          type: probe
          provider:
            type: process
            path: date
          tolerance: 1
          # pauses:
          #   after: 1
[2022-06-22 17:54:48 DEBUG] [cli:113] Running command 'run'
[2022-06-22 17:54:48 DEBUG] [cli:117] Using settings file '/Users/jeal/.chaostoolkit/settings.yaml'
[2022-06-22 17:54:49 DEBUG] [__init__:399] No controls to apply on 'loader'
[2022-06-22 17:54:49 DEBUG] [__init__:399] No controls to apply on 'loader'
[2022-06-22 17:54:49 DEBUG] [caching:24] Building activity cache...
[2022-06-22 17:54:49 DEBUG] [caching:35] Cached 3 activities
[2022-06-22 17:54:49 INFO] [experiment:58] Validating the experiment's syntax
[2022-06-22 17:54:49 DEBUG] [configuration:63] Loading configuration...
[2022-06-22 17:54:49 DEBUG] [secret:78] Loading secrets...
[2022-06-22 17:54:49 DEBUG] [secret:104] Done loading secrets
[2022-06-22 17:54:49 DEBUG] [python:196] Control 'validate_control' loaded from '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py'
[2022-06-22 17:54:49 DEBUG] [python:192] Control module '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaostoolkit_rappi-0.1.0-py3.9.egg/chaos_rappi/controls/state_sharing.py' does not declare 'validate_control'
[2022-06-22 17:54:49 INFO] [experiment:109] Experiment looks valid
[2022-06-22 17:54:49 DEBUG] [caching:42] Clearing activities cache
[2022-06-22 17:54:49 DEBUG] [caching:24] Building activity cache...
[2022-06-22 17:54:49 DEBUG] [caching:35] Cached 3 activities
[2022-06-22 17:54:49 DEBUG] [configuration:63] Loading configuration...
[2022-06-22 17:54:49 DEBUG] [secret:78] Loading secrets...
[2022-06-22 17:54:49 DEBUG] [secret:104] Done loading secrets
[2022-06-22 17:54:49 DEBUG] [configuration:155] Loading dynamic configuration...
[2022-06-22 17:54:49 INFO] [run:320] Running experiment: journal output test
[2022-06-22 17:54:49 DEBUG] [__init__:52] Initializing controls
[2022-06-22 17:54:49 DEBUG] [__init__:61] Initializing control 'safeguard'
[2022-06-22 17:54:49 DEBUG] [python:196] Control 'configure_control' loaded from '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py'
[2022-06-22 17:54:49 DEBUG] [__init__:61] Initializing control 'get metadata'
[2022-06-22 17:54:49 DEBUG] [python:192] Control module '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaostoolkit_rappi-0.1.0-py3.9.egg/chaos_rappi/controls/state_sharing.py' does not declare 'configure_control'
[2022-06-22 17:54:49 INFO] [run:344] Steady-state strategy: default
[2022-06-22 17:54:49 INFO] [run:348] Rollbacks strategy: default
[2022-06-22 17:54:49 INFO] [run:353] No steady state hypothesis defined. That's ok, just exploring.
[2022-06-22 17:54:49 DEBUG] [__init__:409] Applying before-control 'safeguard' on 'experiment'
[2022-06-22 17:54:49 DEBUG] [python:196] Control 'before_experiment_control' loaded from '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py'
[2022-06-22 17:54:49 DEBUG] [__init__:409] Applying before-control 'safeguard' on 'activity'
[2022-06-22 17:54:49 DEBUG] [python:192] Control module '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py' does not declare 'before_activity_control'
[2022-06-22 17:54:49 DEBUG] [process:52] Running: ['/bin/date']
[2022-06-22 17:54:49 DEBUG] [__init__:409] Applying after-control 'safeguard' on 'activity'
[2022-06-22 17:54:49 DEBUG] [python:192] Control module '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py' does not declare 'after_activity_control'
[2022-06-22 17:54:49 CRITICAL] [safeguards:290] Safeguard 'safeguard' triggered the end of the experiment
[2022-06-22 17:54:49 INFO] [run:607] Playing your experiment's method now...
[2022-06-22 17:54:49 DEBUG] [safeguards:198] Safeguard 'safeguard' finished normally
[2022-06-22 17:54:49 WARNING] [run:420] Received the exit signal: 20
[2022-06-22 17:54:49 INFO] [run:458] Experiment ended with status: interrupted
[2022-06-22 17:54:49 DEBUG] [__init__:409] Applying after-control 'safeguard' on 'experiment'
[2022-06-22 17:54:49 DEBUG] [python:196] Control 'after_experiment_control' loaded from '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py'
^C[2022-06-22 17:54:53 DEBUG] [__init__:91] Cleaning up controls
[2022-06-22 17:54:53 DEBUG] [__init__:100] Cleaning up control 'safeguard'
[2022-06-22 17:54:53 DEBUG] [python:192] Control module '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaosaddons/controls/safeguards.py' does not declare 'cleanup_control'
[2022-06-22 17:54:53 DEBUG] [__init__:100] Cleaning up control 'get metadata'
[2022-06-22 17:54:53 DEBUG] [python:192] Control module '/Users/jeal/.pyenv/versions/3.9.9/envs/chaostoolkit/lib/python3.9/site-packages/chaostoolkit_rappi-0.1.0-py3.9.egg/chaos_rappi/controls/state_sharing.py' does not declare 'cleanup_control'
[2022-06-22 17:54:53 DEBUG] [caching:42] Clearing activities cache

Aborted!

p.s.: I'm running Python 3.9.9 (from homebrew) on macOS 12.4, though I don't think it has anything to do with it.