chaostoolkit / chaostoolkit

Chaos Engineering Toolkit & Orchestration for Developers

Home Page:https://chaostoolkit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InterruptExecution in control configure_control

michael-gehtman-wix opened this issue · comments

commented

As explainded in docs
InterruptExecution is the exception type to stop the experiment from any control.
The above statement is correct for all the controls, except the configure_control.
The only way we found is possible to stop the experiment in configure_control, is to raise BaseException.

Is there a different way to behave withconfigure_control?
or this is the only control you should not stop?
Yet we have crucial checks in this control, that if they are not passed we must terminate the experiment.
And secrets are needed to execute those checks, but in other controls, we don't get them. You can see this related issue #187

Good catch.

The code traps any exception to make controls silent when they fail to initialize properly. The idea was that controls should never impact the experiment that way.

The fact BaseException works is because we catch Exception is lower than BaseException :p

https://docs.python.org/3/library/exceptions.html#exception-hierarchy

I could change so that InterruptException acts like in other control points indeed.

commented

Having the option to enforce termination is great, I think it should be consistent for all the control points including configure_control.

Regarding the

"The idea was that controls should never impact the experiment"

control is a part of the experiment, isn't it?
For example, before or after each activity I want to know if I have any DDOS attacks on my system.
If there is, I want to terminate the entire experiment before it gets to the next activity.

or the proper way is to do those kinds of checks inside the activity itself? this kind of solution will be a bit boilerplate in my opinion.

What I'm trying to say,
is control in the core idea should not terminate the experiment cause its a bad experiment implementation?

I think the way I put it is as follows:

  • experiment is authored by someone who wants to surface a particular behavior of a system
  • controls are operational concerns from people executing said experiment

That is why I'm saying controls are not impacting of the experiment. They are an operational concern, not an evidence-seeking concern. They happen to live in the same file but are managed by different roles for distinc purposes.

I should be able to take your experiment and run it somewhere else even if I have different operational requirements than yours.

commented

Basically, if you run the experiment without those checks it will work fine,
but it will be better to not run this particular experiment while some other events occurred like DDoS.
Therefore I don't want it to be a direct part of the experiment like a probe, cause maybe in other cases I would want to run the same experiment but I would not care about DDoS.
If this DDoS check will be a probe, then I'll need to run it before and after each action,
making my experiment JSON file huge, logs file will be filled with much more unneeded INFO of DDos probe.
and probably there will be a point when we will forget to add the DDoS probe before the action.
Probably not the best example, but I hope you got the point.

I think I almost got your point.
In your point of view, can you please maybe give me an example of when and why to use controls?

I might have been unclear. Controls are the right place for your checks IMO. You can declare them at the experiment level or from the settings (so the experiment isn't even aware of them) . So I would create a Python library of common checks in your organization and distribute it with chaostoolkit and ensure they are properly declared in the settings file of the chaos command.

I'm hesitant to create a new first-class citizen check mechanism if controls are offering the same operational feature.

Now, if you need to run different checks for different experiments, the settings file may not be the right place. Or, perhaps you need a way to signal which check to run based on the experiment's runtime conditions?

also, should we have this discussion in #190 instead?

commented

I would like to summarize this issue before we continue to #190, in order to understand where we stand on the InterruptExecution in configure_control topic.

  • InterruptExecution doesn't stop the configure_control, but you will change it to be like the rest of the controls? right?

Regarding your last question, I do not need runtime settings (at least not now),
I need it on the experiment definition file, to formally and explicitly declare what pre-checks needed in the current experiment.
This issue we can continue in #190,
although seems that it's already settled to split common cheks into different control files to declare pre-checks.

It's an open question. Right now, the existing working solution is to have multiple controls indeed.

We can certainly explore a new element that would be more specific to the use case you describe for clarity. I'm not adverse to that but it won't be implemented straight away.

InterruptExecution doesn't stop the configure_control, but you will change it to be like the rest of the controls? right?

Yes.

chaostoolkit/chaostoolkit-lib#183

commented

Amazing, thank you very much for your patience and time!

Np. Thank you for the detailed explanation and catching that limitation.

I'll try to implement/release that ASAP.

This Issue has not been active in 365 days. To re-activate this Issue, remove the Stale label or comment on it. If not re-activated, this Issue will be closed in 7 days.

This Issue was closed because it was not reactivated after 7 days of being marked Stale.