About

Training a 4 legged agent to walk in Crawler Unity Environment using Proximal Policy Optimization(PPO)algorithm

Crawler Unity Environment

Set-up: A creature with 4 arms and 4 forearms.
Goal: The agents must move its body toward the goal direction without falling.
- CrawlerStaticTarget - Goal direction is always forward.
- CrawlerDynamicTarget- Goal direction is randomized.
Agents: The environment contains 3 agent linked to a single Brain.
Agent Reward Function (independent):
- +0.03 times body velocity in the goal direction.
- +0.01 times body direction alignment with goal direction.
Brains: One Brain with the following observation/action space.
- Vector Observation space: 117 variables corresponding to position, rotation, velocity, and angular velocities of each limb plus the acceleration and angular acceleration of the body.
- Vector Action space: (Continuous) Size of 20, corresponding to target rotations for joints.
- Visual Observations: None.
Reset Parameters: None
Benchmark Mean Reward: 2000

Train a 4 legged crawling agent to walk using Proximal Policy Optimization(PPO)

MIT License

Language:Jupyter Notebook 98.4%Language:Python 1.6%