NickyFot / marathon-envs

A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Marathon Environments

A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.

MarathonEnvs

MarathonEnvs enables the reproduction of these benchmarks within Unity ml-agents using Unity’s native physics simulator, PhysX. MarathonEnvs maybe useful for:

  • Video Game researchers interested in apply bleeding edge robotics research into the domain of locomotion and AI for video games.
  • Traditional academic researchers looking to leverage the strengths of Unity and ML-Agents along with the body of existing research and benchmarks provided by projects such as the DeepMind Control Suite, or OpenAI Mujoco environments.

Note: This project is the result of a contribution from Joe Booth (@Sohojo), a member of the Unity community who currently maintains the repository. As such, the contents of this repository are not officially supported by Unity Technologies.


Getting Started

* * * New Turotial: Getting Started With MarathonEnvs * * *

The tutorial covers:

  • How to setup your Development Environment (Unity, MarthonEnvs + ML-Agents + TensorflowSharp)
  • How to run each agent with their pre-trained models.
  • How to retrain the hopper agent and follow training in Tensorboard.
  • How to modify the hopper reward function and train it to jump.
  • See tutorial here

Requirements

  • Unity 2018.2 (Download here).
  • ML-Agents Toolkit v0.5 (Learn more here).

Installation

  • Clone ml-agents repository.
  • Install ML-Agents Toolkit.
  • Add MarathonEnvs sub-folder from this repository to MLAgentsSDK\Assets\ in cloned ml-agents repository.
  • Add config\marathon_envs_config.yaml from this reprository to config\ in cloned ml-agents repository.

Publications & Usage

An early version of this work was presented March 19th, 2018 at the AI Summit - Game Developer Conference 2018 - http://schedule.gdconf.com/session/beyond-bots-making-machine-learning-accessible-and-useful/856147

Active Research using ML-Agents + MarathonEnvs


Support and Contributing

Support: Post an issue if you are having problems or need help getting a xml working.

Contributing: Ml-Agents 0.5 now supports the Gym interface. It would be of value to the community to reproduce more benchmarcks and create a set of sample code for various algorthems. This would be a great way for someone looking to gain some experiance with Re-enforcement Learing. I would gladdly support and / or partner. Please post an issue if you are interesgted. Here are some ideas:


Included Environments

Humanoid

DeepMindHumanoid
DeepMindHumanoid
  • Set-up: Complex (DeepMind) Humanoid agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -joints at limit penality
      • -effort penality (ignors hip_y and knee)
      • +velocity
      • -height penality if below 1.2m
    • Inspired by Deliberate Practice (currently, only does legs)
      • +facing upright bonus for shoulders, waist, pelvis
      • +facing target bonus for shoulders, waist, pelvis
      • -non straight thigh penality
      • +leg phase bonus (for height of knees)
      • +0.01 times body direction alignment with goal direction.
      • -0.01 times head velocity difference from body velocity.
  • Agent Terminate Function:
    • TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 88 variables
    • Vector Action space: (Continuous) Size of 21 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Hopper

DeepMindHopper
DeepMindHopper
  • Set-up: DeepMind Hopper agents.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -effort penality
      • +velocity
      • +uprightBonus
      • -height penality if below .65m OpenAI, 1.1m DeepMind
  • Agent Terminate Function:
    • DeepMindHopper: TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
    • OpenAIHopper
      • TerminateOnNonFootHitTerrain
      • Terminate if height < .3m
      • Terminate if head tilt > 0.4
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 31 variables
    • Vector Action space: (Continuous) 4 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Walker

DeepMindWalker
DeepMindWalker
  • Set-up: DeepMind Walker agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -effort penality
      • +velocity
      • +uprightBonus
      • -height penality if below .65m OpenAI, 1.1m DeepMind
  • Agent Terminate Function:
    • TerminateOnNonFootHitTerrain - Agent terminates when a body part other than foot collides with the terrain.
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 41 variables
    • Vector Action space: (Continuous) Size of 6 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Ant

OpenAIAnt
OpenAIAnt
  • Set-up: OpenAI and Ant agent.
  • Goal: The agent must move its body toward the goal as quickly as possible without falling.
  • Agents: The environment contains 16 independent agents linked to a single brain.
  • Agent Reward Function:
    • Reference OpenAI.Roboschool and / or DeepMind
      • -joints at limit penality
      • -effort penality
      • +velocity
  • Agent Terminate Function:
    • Terminate if head body > 0.2
  • Brains: One brain with the following observation/action space.
    • Vector Observation space: (Continuous) 53 variables
    • Vector Action space: (Continuous) Size of 8 corresponding to target rotations applicable to the joints.
    • Visual Observations: None.
  • Reset Parameters: None.

Details

Key Files / Folders

  • MarathonEnvs - parent folder
    • Scripts/MarathonAgent.cs - Base Agent class for Marathon implementations
    • Scripts/MarathonSpawner.cs - Class for creating a Unity game object from a xml file
    • Scripts/MarathonJoint.cs - Model for mapping MuJoCo joints to Unity
    • Scripts/MarathonSensor.cs - Model for mapping MuJoCo sensors to Unity
    • Scripts/MarathonHelper.cs - Helper functions for MarathonSpawner.cs
    • Scripts/HandleOverlap.cs - helper script to for detecting overlapping Marathon elements.
    • Scripts/ProceduralCapsule.cs - Creates a Unity capsule which matches MuJoCo capsule
    • Scripts/SendOnCollisionTrigger.cs - class for sending collisions to MarathonAgent.cs
    • Scripts/SensorBehavior.cs - behavior class for sensors
    • Scripts/SmoothFollow.cs - camera script
    • Enviroments - sample enviroments
      • DeepMindReferenceXml - xml model files used in DeepMind research source
      • DeepMindHopper - Folder for reproducing DeepMindHopper
      • OpenAIAnt - Folder for reproducing OpenAIAnt
      • etc
  • config
    • marathon_envs_config.yaml - trainer-config file. The hyperparameters used when training from python.

Tuning params / Magic numbers

  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.Force2D = set to True when implementing a 2d model (hopper, walker)

  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.DefaultDesity:

    • 1000 = default (= same as MuJoCo)
    • Note: maybe overriden within a .xml script
  • xxNamexx\Prefab\xxNamexx -> MarathonSpawner.MotorScale = Magic number for tuning (scaler applied to all motors)

    • 1 = default ()
    • 1.5 used by DeepMindHopper, DeepMindWalker
  • xxNamexx\Prefab\xxNamexx -> xxAgentScript.MaxStep / DecisionFrequency:

    • 5000,5: OpenAIAnt, DeepMindHumanoid
    • 4000,4: DeepMindHopper, DeepMindWalker
    • Note: all params taken from OpenAI.Gym

Important:

  • This is not a complete implementation of MuJoCo; it is focused on doing just enough to get the locomotion enviroments working in Unity. See Scripts/MarathonSpawner.cs for which MuJoCo commands and ignored or partially implemented.
  • PhysX makes many tradeoffs in terms of accuracy when compared with Mujoco. It may not be the best choice for your research project.
  • Marathon environments are running at 300-500 physics simulations per second. This is significantly higher that Unity’s defaults setting of 50 physics simulations per second.
  • Currently, Marathon does not properly simulate how MuJoCo handles joint observations - as such, it maybe difficult to do transfer learning (from simulation to real world robots)

References:

About

A set of high-dimensional continuous control environments for use with Unity ML-Agents Toolkit.

License:Apache License 2.0


Languages

Language:C# 100.0%