tmralmeida / wasp-AI-ML-m1

Vacuum Cleaning Agent Programming Exercise.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

wasp-AI-ML-m1

Vacuum Cleaning Agent Programming Exercise. Training an agent with REINFORCE Reinforcement Learning algorithm to perform the cleaning task of multi-variate environments.

Installation

conda env create --name <env_name> --file requirements.yml

Current status

  • Simulator
    • agent in the environment
    • obstacles in the environment
    • dirty in the environment
  • Training smart agent
  • Testing smart agent according to users' inputs

Traininig Instructions

Constraints/assumptions:

  • there is only one robot
  • there is a fixed amount of obstacles = num_cells
  • the range of dirty cells range from 1 to the covered cells from the robot location

First, we need to train the agent to become smart. Therefore, we will train the agent in a vast number of different scenarios, where the rewards stand for:

  • hitting an obstacle = -0.1
  • cleaning dirty cells = +2
  • moving one cell = -0.1

This means that every time the agent moves gets a penalty of 0.1. Whenever it hits an obstacle, it gets a penalty of the same amount. Finally, per each cleaned cell of the environment, it gets a reward of +2.

To do so, there is a cfg file that the user can modify to set different hyperparameters s.t:

Parameter Name Type Default Additional Info
epochs int 1200 Number of epochs to train
steps_per_epoch int 4000 Number maximum of (s, a) per epoch
max_ep_len int 1000 Max len of a traj/episode/rollout
v_train_iters int 80` Number of updates in the value function
obs_reward float -0.1` Reward for hitting an obstacle
dirt_reward float 2.0` Reward for cleaning a dirty cell
ene_reward float -0.1 Reward for moving one cell
gamma float 0.99 Discount factor (adv. function)
lambda float 0.97 Adv. function hyperparameter
pi_lr float 3e-4 Learning rate for the policy net
v_lr float 1e-3 Learning rate for the value function
hidden_sizes List [64,32] Shape of each hidden FC layer
window_size int 3 Robot's perception window

For training, run train.py script as follows:

python train.py --num_cells NUM_CELLS --cfg PATH_TO_CFG

The number of side cells can vary within the range [5,11]. After training, check plots and trained models in outputs directory.

Testing Instructions

For testing, first, change the output model path (save_dir) in the cfg file. Then, run run_agent.py script as follows:

python run_agent.py --num_cells NUM_CELLS --cfg PATH_TO_CFG

After launching the simulator, the user should use the buttons of his mouse:

  • The user should use the button left of his mouse to place/remove the robot in/from the environment (black cells)
  • The user should use the button right of his mouse to place/remove obstacles (red cells) and dirty in/from the environment (green cells)
  • Nevertheless, the user can remove any object by simply clicking again on the respective occupied cell with the correspondent button
  • After designing the environment, click p to run the training of the smart agent.

The user can keep track of the changes of the environment by looking at the logs in the terminal.

7_7_simple.mp4

Black cell - agent Red cells - obstacles Green cell - dirt

About

Vacuum Cleaning Agent Programming Exercise.

License:MIT License


Languages

Language:Python 100.0%