InterruptExecution in control configure_control

Question

InterruptExecution in control configure_control

michael-gehtman-wix opened this issue 4 years ago · comments

As explainded in docs
InterruptExecution is the exception type to stop the experiment from any control.
The above statement is correct for all the controls, except the configure_control.
The only way we found is possible to stop the experiment in configure_control, is to raise BaseException.

Is there a different way to behave withconfigure_control?
or this is the only control you should not stop?
Yet we have crucial checks in this control, that if they are not passed we must terminate the experiment.
And secrets are needed to execute those checks, but in other controls, we don't get them. You can see this related issue #187

Sylvain Hellegouarch · Answer 1 · Wed Aug 12 2020 20:00:37 GMT+0800 (China Standard Time)

Good catch.

The code traps any exception to make controls silent when they fail to initialize properly. The idea was that controls should never impact the experiment that way.

The fact BaseException works is because we catch Exception is lower than BaseException :p

https://docs.python.org/3/library/exceptions.html#exception-hierarchy

I could change so that InterruptException acts like in other control points indeed.

S4L · Answer 2 · Wed Aug 12 2020 21:27:13 GMT+0800 (China Standard Time)

Having the option to enforce termination is great, I think it should be consistent for all the control points including configure_control.

Regarding the

"The idea was that controls should never impact the experiment"

control is a part of the experiment, isn't it?
For example, before or after each activity I want to know if I have any DDOS attacks on my system.
If there is, I want to terminate the entire experiment before it gets to the next activity.

or the proper way is to do those kinds of checks inside the activity itself? this kind of solution will be a bit boilerplate in my opinion.

What I'm trying to say,
is control in the core idea should not terminate the experiment cause its a bad experiment implementation?

Sylvain Hellegouarch · Answer 3 · Wed Aug 12 2020 21:48:14 GMT+0800 (China Standard Time)

I think the way I put it is as follows:

experiment is authored by someone who wants to surface a particular behavior of a system
controls are operational concerns from people executing said experiment

That is why I'm saying controls are not impacting of the experiment. They are an operational concern, not an evidence-seeking concern. They happen to live in the same file but are managed by different roles for distinc purposes.

I should be able to take your experiment and run it somewhere else even if I have different operational requirements than yours.

S4L · Answer 4 · Wed Aug 12 2020 22:34:05 GMT+0800 (China Standard Time)

Basically, if you run the experiment without those checks it will work fine,
but it will be better to not run this particular experiment while some other events occurred like DDoS.
Therefore I don't want it to be a direct part of the experiment like a probe, cause maybe in other cases I would want to run the same experiment but I would not care about DDoS.
If this DDoS check will be a probe, then I'll need to run it before and after each action,
making my experiment JSON file huge, logs file will be filled with much more unneeded INFO of DDos probe.
and probably there will be a point when we will forget to add the DDoS probe before the action.
Probably not the best example, but I hope you got the point.

I think I almost got your point.
In your point of view, can you please maybe give me an example of when and why to use controls?

Sylvain Hellegouarch · Answer 5 · Wed Aug 12 2020 22:48:47 GMT+0800 (China Standard Time)

I might have been unclear. Controls are the right place for your checks IMO. You can declare them at the experiment level or from the settings (so the experiment isn't even aware of them) . So I would create a Python library of common checks in your organization and distribute it with chaostoolkit and ensure they are properly declared in the settings file of the chaos command.

I'm hesitant to create a new first-class citizen check mechanism if controls are offering the same operational feature.

Now, if you need to run different checks for different experiments, the settings file may not be the right place. Or, perhaps you need a way to signal which check to run based on the experiment's runtime conditions?

Sylvain Hellegouarch · Answer 6 · Wed Aug 12 2020 22:49:25 GMT+0800 (China Standard Time)

also, should we have this discussion in #190 instead?

S4L · Answer 7 · Thu Aug 13 2020 15:57:00 GMT+0800 (China Standard Time)

I would like to summarize this issue before we continue to #190, in order to understand where we stand on the InterruptExecution in configure_control topic.

InterruptExecution doesn't stop the configure_control, but you will change it to be like the rest of the controls? right?

Regarding your last question, I do not need runtime settings (at least not now),
I need it on the experiment definition file, to formally and explicitly declare what pre-checks needed in the current experiment.
This issue we can continue in #190,
although seems that it's already settled to split common cheks into different control files to declare pre-checks.

Sylvain Hellegouarch · Answer 8 · Thu Aug 13 2020 17:10:33 GMT+0800 (China Standard Time)

It's an open question. Right now, the existing working solution is to have multiple controls indeed.

We can certainly explore a new element that would be more specific to the use case you describe for clarity. I'm not adverse to that but it won't be implemented straight away.

Sylvain Hellegouarch · Answer 9 · Thu Aug 13 2020 17:12:41 GMT+0800 (China Standard Time)

InterruptExecution doesn't stop the configure_control, but you will change it to be like the rest of the controls? right?

Yes.

chaostoolkit/chaostoolkit-lib#183

S4L · Answer 10 · Thu Aug 13 2020 17:15:31 GMT+0800 (China Standard Time)

Amazing, thank you very much for your patience and time!

Sylvain Hellegouarch · Answer 11 · Thu Aug 13 2020 17:16:28 GMT+0800 (China Standard Time)

Np. Thank you for the detailed explanation and catching that limitation.

I'll try to implement/release that ASAP.

github-actions · Answer 12 · Sun Aug 15 2021 08:44:02 GMT+0800 (China Standard Time)

This Issue has not been active in 365 days. To re-activate this Issue, remove the Stale label or comment on it. If not re-activated, this Issue will be closed in 7 days.

github-actions · Answer 13 · Sun Aug 29 2021 08:46:33 GMT+0800 (China Standard Time)

This Issue was closed because it was not reactivated after 7 days of being marked Stale.