Baseline Imitation Learning Policies & Results

Question

Baseline Imitation Learning Policies & Results

siddk opened this issue 2 years ago · comments

I really love this incredible, amazing effort; just to get a sense of using this library fordeveloping new strategies for imitation learning & reinforcement learning from vision in diverse, multi-task environments, it would be nice to have some benchmarks (or at least brief experimental writeups) to build off of!

Specifically, it would be really nice to get a sense of a "base" policy architecture, amount of demonstration data, and expected success rate for a subset (if not all) of the tasks in the benchmark.

I'm happy to help with running things out for other environments if there's code available, but if there's at least a writeup available, I'd really love to see it! Thanks!

Stephen James · Answer 1 · Mon Jun 13 2022 09:23:05 GMT+0800 (China Standard Time)

Hi @siddk ,
Yes I agree that it would be nice. The reason why one doesn't exists is because simple observation -> joint velocity policies didn't work well because the tasks are challenging and sparsely rewarded, and so more specialized approaches are needed, e.g. my own C2F-ARM.
However, I have recently added shaped rewards for some tasks, and so maybe now would be a good time for me to put up some starter code for learning RL/BC policies in pytorch. I'll try and find some time over the next month or two.

Sidd Karamcheti · Answer 2 · Mon Jun 13 2022 09:54:12 GMT+0800 (China Standard Time)

@stepjam - I can totally see how that’s the case for RL? When it comes to imitation learning/BC - did you find similar results (even as you scaled up the number of demonstrations)?

Sidd Karamcheti · Answer 3 · Mon Jul 18 2022 04:14:24 GMT+0800 (China Standard Time)

Hey @stepjam - just wanted to follow-up on this (no big rush); any chance of getting a simple BC agent that works for some number of pre-generated demonstrations?