chaos_experimentation: scheduling of experiments
kathan24 opened this issue · comments
Kathan commented
Description
A way to schedule recurring experiments will allow us to run experiments 24x7. Currently, the status of the experiment is determined from the start and end date. If you set the start date to a future date, the experiment's status will change on that date and time. However, this does not allow us to run recurring experiments.
One of the high-level approaches to make this feature available is below.
Pre-reqisite - the asynchronous task support feature
Plan
- Start persisting the status of the experiments in Postgres database.
- Add new status called
STATUS_SCHEDULED
- Create a new task using the Asynchronous task support. This task will check for all the experiments whose
status
==STATUS_SCHEDULED
andstart_time
<current_time
and update the status toSTATUS_RUNNING
. Same task or maybe a new one will terminate the experiment by setting the status toSTATUS_COMPLETED
- There will be knobs on the UI to select the recurring occurrences of the experiment.
Flow
- When someone creates an experiment from the UI, the status will be set to
STATUS_SCHEDULED
. - Once the start date of the experiment becomes current, change the status to
STATUS_RUNNING
. - xDS server will pick up the experiments whose status is
STATUS_RUNNING
and inject faults. - Asnc task will then terminate the experiment when
end_time
==current_time
and will scheduled next experiment
These details needs to be flushed and
Complexity [S/M/L]: L