GenReL-World

The present state of GenReL-World is a work in progress. In case of constructive toughts please open an issue.

Table of Contents

Abstract
Project Work
Setup
- Installation
Frameworks and Testing
Train
Research
References
Credits

Abstract

GenReL-World is a general Reinforcement Learning framework to utilize various world models as environments for robot manipulation.

The goal is to contrust the framework which utilizes general reinforcement learning algorithms to control a robotic arm. A big problem in robotics is, that they adapt hardly to new environments or to task, which fall out to the trained task distribution. The adaptation of a general framework can be worthwhile if they can be intagrated to be trained on different world models.

With that only the world model and the encoding to a lower latent representation have to be switched. Implementing the different algorithms and finding connections between them is an ongoing research area, which can play a crutial part in robotics.

The framework involves reinforcement learning concepts from (0) and meta-reinforcement learning and multi-task learning using the MetaWorld open-source simulated benchmark (1).

The project utilizes Google DeepMinds's MuJoCo (Multi-Joint dynamics with Contact) as a general purpose physics engine that aims to facilitate research and development in robotics. (2).

The project also includes a built 7 degree of freedom robotic arm which is simulated with MuJoCo Menagerie's xArm7 (or some other model) as a part of the collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind (3) (4).

We also share the dream of Yann LeCun about how to construct autonomous intelligent agents (5).

Project Work

The project is conducted by Mark Czimber and Josh Kang

The bases of GenReL-World is part of the 2024 Deep Learning class of Aquincum Institute of Technology project work. The first milestones include simple implementations of reinforcement learning algorithms for example along with meta reinforcement learning.

These will be used to simpler task such as moving objects and controlling a robotic arm in a virtual environment. This can later be scaled by the mixing of different algorithms and implementations of new approaches.

Several ongoing reasearch papers are taken into account during the development of the project (5) (6) (7).

Setup

GenReL-World is based on python3 high-level language, which is widely used for reinforcement learning. The project also requires several python library dependencies which can be installed from PyPI using the pip install "required library" terminal command.

Installation

To install everything, run:

git clone https://github.com/CIMBIBOY/GenReL-World.git

Create a virtual environment inside GenReL-World forlder by, run either:

Python: 
python -m venv myenv

Anaconda or Miniconda:
conda create --name myenv

After creation dependencies will be installed with poetry, run:

pip install poetry
# downloading poetry python package for handling pip packages easier

poetry init
# initialization will be done based on the project.toml file

poetry shell
# activation virtual environment

poetry install
# for downloading dependencies:

poetry add <package_name>
# to add own packages

poetry update
# to configure freshly added packages

Frameworks and Testing

Used frameworks and repositories for GenReL-World research:

MetaWorld is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning.
MuJoCo is a general purpose physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, machine learning, and other areas.
MuJoCo Menagerie is a collection of high-quality models for the MuJoCo physics engine including models that work well right out of the gate.
garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit.

MetaWorld

To install MetaWorld follow the installation steps. The README of MetaWorld is a worthwhile read which can be found here.

A visualization of MetaWorld can be done in MuJoCo's 3D environment with running testMetaW.py with mjpython test_env/testMetaW.py command in bash environment created by poetry shell.

MuJoCo

To install MuJoCo follow installation steps on the MuJoCo github page. For python users this is a simple installation from PyPI as pip install mujaco. MuJoCo can also be downloaded from the offical site.

MuJoCo Menagerie

To install MuJoCo Menagerie follow the installation steps on the MuJoCo Menagerie github. To visalize xArm7 run the testxArm7.py with mjpython test_env/testxArm7.py command in bash environment created by poetry shell.

The xarm7.xml can also be dragged to MuJoCo app downloaded from the official cite.

garage

To check out garage reinforcement learning algorithms clone the github repository of garage and follow installation and trial steps.

Train

Training is currently in it's early trials.

Training with own implementation of a Proximity Policy Optimalization struggles to solve pick-place-v2 task. Train can be visualized and monitored by running:

mjpython train/trainMetaW.py.

Integrating garage (8) algorithms are currently under testing and trains coulnd't be succesfully run yet. Currently there is errors with wrapping metaworld environment into gym environment. To see problems, run:

mjpython train/train_garagePPO.py
# first implementation of wrapping metaworld environment into gym environment.

mjpython train/train_garagePPO.py
# garage implementation of wrapping metaworld environment into gym environment.

Trochrl (9) also provides great algorithm implementations but there is some unresolved import error on my mac m2 arm64, which is yet to be solved. Implementation can be found in: train/train_torchrl.py

There is consideration about a Temporal Difference Model Predictive Control algorithm, which may yield great results. Implementation can be found at github of tdmpc2 (10).

Research

Severeal research papers have been taken into account and also many algorithms are considered based on generalizing RL. Most promising one is the idea of Model Predictive Control, which can be useful for the robot to observe the environment better by the creation of a lower latent dimensional space. More summeries can be found in our MPC summary document

References

(0) Richard S. Sutton and Andrew G. Barto. (2018). ReinforcementLearning: An Introduction (second edition). The MIT Press.

(1) Official cite and Github by Farama-Foundation MetaWorld:

@inproceedings{yu2019meta,
  title={Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning},
  author={Tianhe Yu and Deirdre Quillen and Zhanpeng He and Ryan Julian and Karol Hausman and Chelsea Finn and Sergey Levine},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019}
  eprint={1910.10897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
  url={https://arxiv.org/abs/1910.10897}
}

(2) Official cite and Github MuJoCo:

@inproceedings{todorov2012mujoco,
  title={MuJoCo: A physics engine for model-based control},
  author={Todorov, Emanuel and Erez, Tom and Tassa, Yuval},
  booktitle={2012 IEEE/RSJ International Conference on Intelligent Robots and Systems},
  pages={5026--5033},
  year={2012},
  organization={IEEE},
  doi={10.1109/IROS.2012.6386109}
}

(3) Github MuJoCo Menagerie:

@software{menagerie2022github,
  author = {Zakka, Kevin and Tassa, Yuval and {MuJoCo Menagerie Contributors}},
  title = {{MuJoCo Menagerie: A collection of high-quality simulation models for MuJoCo}},
  url = {http://github.com/google-deepmind/mujoco_menagerie},
  year = {2022},
}

(4) MuJoCo Menagerie xArm7.

(5) Yann LeCun. (2022). A Path Towards Autonomous Machine Intelligence, Version 0.9.2, 2022-06-27. Courant Institute of Mathematical Sciences, New York University and Meta - Fundamental AI Research.

(6) Jacob Beck et al. (2023). A Survey of Meta-Reinforcement Learning. arXiv:2301.08028 [cs.LG].

(7) Danijar Hafner et al. (2020). Mastering Atari with Discrete World Models. arXiv:2010.02193 [cs.LG].

(8) Github Garage:

@misc{garage,
 author = {The garage contributors},
 title = {Garage: A toolkit for reproducible reinforcement learning research},
 year = {2019},
 publisher = {GitHub},
 journal = {GitHub repository},
 howpublished = {\url{https://github.com/rlworkgroup/garage}},
 commit = {be070842071f736eb24f28e4b902a9f144f5c97b}
}

(9) Github torchrl:

@misc{bou2023torchrl,
      title={TorchRL: A data-driven decision-making library for PyTorch}, 
      author={Albert Bou and Matteo Bettini and Sebastian Dittert and Vikash Kumar and Shagun Sodhani and Xiaomeng Yang and Gianni De Fabritiis and Vincent Moens},
      year={2023},
      eprint={2306.00577},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

(10) Github Temporal Difference Model Predictive Control:

@inproceedings{hansen2024tdmpc2,
  title={TD-MPC2: Scalable, Robust World Models for Continuous Control}, 
  author={Nicklas Hansen and Hao Su and Xiaolong Wang},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

@inproceedings{hansen2022tdmpc,
  title={Temporal Difference Learning for Model Predictive Control},
  author={Nicklas Hansen and Xiaolong Wang and Hao Su},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2022}
}

Credits

Huge thanks and big credit to the Meta AI and Google Deepmind who are one of the most influential intelligence laboratories and builders of the open community research!

CIMBIBOY / GenReL-World