anwu21 / future-image-similarity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model-based Behavioral Cloning with Future Image Similarity Learning

This repository is for our CoRL 2019 paper:

Alan Wu, AJ Piergiovanni, and Michael S. Ryoo
"Model-based Behavioral Cloning with Future Image Similarity Learning"
in CoRL 2019

If you find this repository useful for your research, please cite our paper:

    @inproceedings{wu2019fisl,
          title={Model-based Behavioral Cloning with Future Image Similarity Learning},
          booktitle={Conference on Robot Learning (CoRL)},
          author={Alan Wu, AJ Piergiovanni, and Michael S. Ryoo},
          year={2019}
    }

We present a visual imitation learning framework that enables learning of robot action policies solely based on expert samples without any robot trials. Robot exploration and on-policy trials in a real-world environment could often be expensive or dangerous. We present a new approach to address this problem by learning a future scene prediction model solely on a collection of expert trajectories consisting of unlabeled example videos and actions, and by enabling generalized action cloning using future image similarity. The robot learns to visually predict the consequences of taking an action, and obtains the policy by evaluating how similar the predicted future image is to an expert image. We develop a stochastic action-conditioned convolutional autoencoder, and present how we take advantage of future images for robot learning. We conduct experiments in simulated and real-life environments using a ground mobility robot with and without obstacles, and compare our models to multiple baseline methods.

Here is a sample of training videos from a real office environment with various targets:

krogervballairfiltjoes

And here is a sample of training videos from a simulated environment (Gazebo) with various obstacles:

obs1obs2

Sample training data can be found in the folders /dataset/office_real and /dataset/gazebo_sim. The entire dataset can be downloaded by clicking the link here: Dataset. We use images of size 64x64.

Here is an illustration of the stochastic image predictor model. This model takes input of the current image and action, but also learns to generate a prior, zt, which varies based on the input sequence. This is further concatenated with the representation before future image prediction. The use of the prior allows for better modeling in stochastic environments and generates clearer images.

Model

Predicted future images in the real-life lab (top) and simulation (bottom) environments taking different actions. Top two rows of each environment: deterministic model with linear and convolutional state representation, respectively. Bottom two rows: stochastic model with linear and convolutional state representation, respectively. Center image of each row is current image with each adjacent image to the left turning -5° and to the right turning +5°.

Arc_Lab

Arc Gaz

Sample predicted images from the real and simulation datasets. From left to right: current image; true next image; deterministic linear; deterministic convolutional; stochastic linear; stochastic convolutional.

High level description of action taken for each row starting from the top: turn right; move forward; move forward slightly; move forward and turn left; move forward and turn left.

Lab

High level description of action taken for each row starting from the top: moveforward and turn right; turn right slightly; turn right; move forward slightly; turn left slightly.

Gaz

Using the stochastic future image predictor, we can generate realistic images to train a critic V_hat that helps select the optimal action:

Critic



We verified our future image prediction model and critic model in real life and in simulation environments. Here are some example trajectories from the real-life robot experiments comparing to baselines (Clone, Handcrafted Critic, and Forward Consistency). Our method is labeled as Critic-FutSim-Stoch. The red ‘X’ marks the location of the target object and the blue ‘∗’ marks the end of each robot trajectory.

Test trajectories

Requirements

Our code has been tested on Ubuntu 16.04 using python 3.5, PyTorch version 0.3.0 with a Titan X GPU.

About


Languages

Language:Python 100.0%