stepjam / RLBench

A large-scale benchmark and learning environment.

Home Page:https://sites.google.com/corp/view/rlbench

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Baseline Imitation Learning Policies & Results

siddk opened this issue · comments

I really love this incredible, amazing effort; just to get a sense of using this library fordeveloping new strategies for imitation learning & reinforcement learning from vision in diverse, multi-task environments, it would be nice to have some benchmarks (or at least brief experimental writeups) to build off of!

Specifically, it would be really nice to get a sense of a "base" policy architecture, amount of demonstration data, and expected success rate for a subset (if not all) of the tasks in the benchmark.

I'm happy to help with running things out for other environments if there's code available, but if there's at least a writeup available, I'd really love to see it! Thanks!

Hi @siddk ,
Yes I agree that it would be nice. The reason why one doesn't exists is because simple observation -> joint velocity policies didn't work well because the tasks are challenging and sparsely rewarded, and so more specialized approaches are needed, e.g. my own C2F-ARM.
However, I have recently added shaped rewards for some tasks, and so maybe now would be a good time for me to put up some starter code for learning RL/BC policies in pytorch. I'll try and find some time over the next month or two.

@stepjam - I can totally see how that’s the case for RL? When it comes to imitation learning/BC - did you find similar results (even as you scaled up the number of demonstrations)?

Hey @stepjam - just wanted to follow-up on this (no big rush); any chance of getting a simple BC agent that works for some number of pre-generated demonstrations?