reinforcement-learning iclr2018 language-grounding multi-task-learning hierarchical-reinforcement-learning

hierarchical-skill-acquisition

Implementation of the Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning by Tianmin Shu, Caiming Xiong, and Richard Socher

Paper Overview

The paper proposes a new method for solving multi-task environments [1]. Authors introduce another hierarchical approach and compare it to other methods such as "Flat" policy and H-DRLN.

Key ideas:

Hierarchical Design
Interpretable Policies
Curriculum Learning

Architecture

The picture above represents the proposed architecture. This architecture can be summarized in one sentence: At any given moment of time t, it decides whether to use one of the already trained policies for a chosen sub-task or to act on its own (low-level actions).

All the way down to LSTM, we encode current state. Then we must decide on several things:

What sub-task policy should we use? (Instruction Policy, here comes the interpretability property)
Should we use a chosen sub-task policy? (Switch Policy)
If we do not use the sub-task policy, what should we do? (Augmented Policy)

If we decided to switch to the sub-task policy, we use Base Policy module. It represents the same architecture described above thus we can go deeper and deeper, infinity and beyond.

The policy is optimized using Advantage-Actor Critic, why not the A3C? - Authors left it as a possible future work.

Training Process

To make this architecture work, we need to manually specify the order of the tasks and pre-train the policy at the zero-level. Particularly, authors work with this curriculum: "Find object" -> "Get object" -> "Put object" -> "Stack object".

"Find object" is the zero-level policy hence it must be pre-trained before moving to the next level task ("Get object").

Information Reference:

Multi-task environment - an environment where the main goal of the agent is to find a trajectory to solve a problem that consists of another smaller problems, e.g. to solve the instruction "Get object", the agent must be able to solve "Find object".

Milestones:

1. Set up the environment

Define the training environment
Define the testing environment
Implement blocks/agent random placement for the training environment
Implement blocks/agent random placement for the testing environment
Define the curriculum

2. Build RL models

3. Train the agent

About

Implementation of the Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning by Tianmin Shu, Caiming Xiong, and Richard Socher

https://openreview.net/forum?id=SJJQVZW0b

reinforcement-learning iclr2018 language-grounding multi-task-learning hierarchical-reinforcement-learning

Languages

Language:Python 100.0%