gym-anm-exp

This repository contains the code used to obtain the experimental results presented in the paper introducing gym-anm:

@misc{henry2021gymanm,
      title={Gym-ANM: Reinforcement Learning Environments for Active Network Management Tasks in Electricity Distribution Systems}, 
      author={Robin Henry and Damien Ernst},
      year={2021},
      eprint={2103.07932},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

which can be accessed here.

Code Overview

The code is divided into two folders:

rl_agents/ contains the code used to train RL agents,
mpc_policies/ contains the code used to run the Model Predictive Control-based policies (for more information, see the gym-anm documentation).

Reinforcement Learning Agents

The folder rl_agents/ mainly contains helper functions, callbacks, and wrappers that were used alongside the Stable Baselines3 RL library:

callbacks.py contains callback functions useful for:
- Displaying a progress bar (ProgressBarCallback), and
- Evaluating the agent's current policy (EvalCallback). This is a slightly modified version of the original callback that calls evaluation.py instead of the original evaluation function.
continue_training.py can be used to continue training an agent using a saved model.
evaluation.py evaluates the agent's current policy. This is a modified version of the original function which also computes the discounted returns.
hyperparameters.py contains all hyperparameters.
train.py is the main training script.
utils.py contains utility functions to initialize environments, etc.
view_progress.py can be used to display some statistics about the current state of training.
wrappers.py contains wrappers for the environment:
- NormalizeActionWrapper normalizes the action space to lie in [-1, 1].
- TimeLimitWrapper sets a maximum number of steps that can be taken in an environment before needing a reset.

Model Predictive Control (MPC) Policies

The script run_mpc.py can be used to run either the constant forecast policy or the perfect forecast policy for different planning steps N (optimization horizon) and safety margin hyperparameters β.

Running The Code

Installation & Requirements

Running the code in this repository requires Python>=3.8 and the packages listed in requirements.txt, which can be installed as follows:

$ pip install -r requirements.txt

RL Agents

Before starting training the agents, you may want to modify certain hyperparameters in rl_agents/hyperparameters.py. In particular, you should specify the folder in which you want the results to be stored BASE_DIR.

Training

You can then start training your agent with:

$ python -m rl_agents.train <ALGO> -s <SEED>

where <ALGO> can be either SAC or PPO and <SEED> is an optional random seed.

Results will be saved in a new directory <BASE_DIR>/<ENV_ID>/run_i/ where i is replaced by an integer so as to create a new directory.

Inspecting training status

You can get an overview of the training status of your agents by running:

$ python -m rl_agents.view_progress

which will print some statistics about the results saved in subfolders of <BASE_DIR>/<ENV_ID>/.

Visualizing the performance of a trained agent

You can watch a trained agent interact with the environment by running:

$ python -m analyze_results.visualize <ALGO> -p <PATH> -s <SLEEP> -T <TIMESTEPS>

where <PATH> is the path to the run folder (in the form <BASE_DIR>/<ENV_ID>/run_<i>/), <SLEEP> is the amount of seconds between updates of the rendering (default is 0.5), and <TIMESTEPS> is the number of timesteps to run.

Recording a video of your trained agent

You can record videos of your trained agent by running:

$ python -m analyze_results.record_screen <PATH> -l <LENGTH> --fps <FPS>

where <PATH> is the path to where you want to save the recording, <LENGTH> is the duration of the recording (seconds) and <FPS> is the number of frames/seconds to make.

NOTE: The above code will simply record your screen. So you need to have the agent already running (see previous section). This feature has not been extensively tested, and similar outcomes can easily be achieved using tools like QuickTime Player for Mac.

MPC Policies

Either MPC-based policies can be run with the following code:

$ python -m mpc_policies.run_mpc <ENV_ID> <POLICY> -T <T> -s <SEED> -o <OUTPUT_FILE>

where <POLICY> can be either constant or perfect.

The above code will run the policy in the environment <ENV_ID> for <T> timesteps, repeat for different safety margins and optimization horizons, and save the final return of each run in the file <OUTPUT_FILE>.

Note that, as specified in the documentation, the policy perfect will only work in the environment ANM6Easy-v0.

Questions

If you have any questions regarding this implementation, please feel free to contact me at robin@robinxhenry.com.

jianzuo / gym-anm-exp