Code for the "Fine-tuning Multimodal Transformer Models for Generating Actions in Virtual and Real Environments" paper
How to install:
- Clone to the project root or install the following repos: cv-utils, cam-utils, cv-repo, ultralytics, and arm-utils.
- Install this package and dependencies from
requirements.txt
. - Add to
PYTHONPATH
the paths torozumarm_vima_cv
,utils
, andcamera_utils
directories that you cloned. - Download all missing VIMA checkpoints from https://github.com/vimalabs/VIMA
How to run:
- to start (cube detector -> sim -> oracle -> arm) pipeline, run
scripts/run_aruco2sim_loop.py
- to start (cube detector -> sim -> ML model -> arm) pipeline, run
scripts/run_model_loop.py
- to start (cam image -> ML model -> arm) pipeline, set
USE_OBS_FROM_SIM=False
inscripts/run_model_loop.py
and run it
Links to datasets: