SuerpX / Embedded-Self-Predictions

We investigate a deep reinforcement learning (RL) architecture that supports explaining why a learned agent prefers one action over another.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paper published at ICLR 2021 (Oral): https://openreview.net/forum?id=Ud3DSz72nYR

ESP agent Training, Evaluation and Producing Explanation

Installation

Install dependencies

    sudo apt install libopenmpi-dev ffmpeg

Install pip packages (Anaconda recommended)

    pip install -r requirements.txt

Prerequisite of Tug-of-War env

  1. Download StarCraft 2 https://github.com/Blizzard/s2client-proto#downloads (4.7.1 recommended, only for Linux)

  2. The StarCraft need to be installed at the ~/ path.

  3. git clone the Tug-of-war env. https://github.com/osu-xai/sc2env

  4. run the following command:

    1. pip install -r requirements_ToW.txt
    2. export PYTHONPATH=path/to/dir/sc2env;path/to/dir/sc2env/sc2env/xai_replay/ui/viz/py_backend/proto;
    3. cd path/to/dir/sc2env
    4. git checkout new-sensors
    
    (back to the ESP_code)
    5. cd Tug-of-War/

Traning new agent

a. Cart Pole

    python3 CP_ESP.py train

b. Lunar Lander

    python3 LL_ESP.py train

c. Tug-of-War 17f

    python3 -m sc2env.play_tug_of_war -f tasks/tug_of_war/gqf/v10_sepcific_new/ -tk task_gqf_2p_2l_grid

d. Tug-of-War 131f

    python3 -m sc2env.play_tug_of_war -f tasks/tug_of_war/gqf/GVFs_all_1_sepcific_new/ -tk task_gqf_2p_2l_grid

Evaluation of trained model

a. Cart Pole

    python3 CP_ESP.py eval

b. Lunar Lander

    python3 LL_ESP.py eval

c. Tug-of-War 17f

    python3 -m sc2env.play_tug_of_war -f tasks/tug_of_war/gqf/v10_sepcific_eval/ -tk task_gqf_2p_2l_grid

d. Tug-of-War 131f

    python3 -m sc2env.play_tug_of_war -f tasks/tug_of_war/gqf/GVFs_all_1_sepcific_eval/ -tk task_gqf_2p_2l_grid

Producing Explanation with trained model

a. Cart Pole

    python3 CP_ESP.py exp

b. Lunar Lander

    python3 LL_ESP.py exp

c. Tug-of-War 17f

    python3 -m sc2env.play_tug_of_war -f tasks/tug_of_war/gqf/v10_sepcific_exp/ -tk task_gqf_2p_2l_grid

d. Tug-of-War 131f

    python3 -m sc2env.play_tug_of_war -f tasks/tug_of_war/gqf/GVFs_all_1_sepcific_exp/ -tk task_gqf_2p_2l_grid

Directory of explanation generted

a. Cart Pole

    CartPole_ESP/CartPole_ESP_exp

b. Lunar Lander

    LunarLander_ESP/LunarLander_ESP_exp

c. Tug-of-War 17f

    Tug-of-War/explanations/tug_of_war/gqf/v10_sepcific

d. Tug-of-War 131f

    Tug-of-War/explanations/tug_of_war/gqf/GVFs_all_1_sepcific

About

We investigate a deep reinforcement learning (RL) architecture that supports explaining why a learned agent prefers one action over another.


Languages

Language:Jupyter Notebook 71.5%Language:Python 28.5%