The results about td3_bc on Antmaze
lucasliunju opened this issue · comments
Hi
May I ask the setting about td3_bc on antmaze. I find current hyperparameters can not work well and obtain a similar result as in the paper.
Best
Hi,
Can you share the details of the experiment you're trying to reproduce? Dataset and the exact configuration file you work with
Thanks for your reply.
I try to run algorithms/td3_bc.py
with the config file configs/td3bc/antmaze/medium_play_v0.yaml
I find the final reward is 0, which is lower than the results that are reported in the paper IQL (10.6) and TD3BC(3.0)
Best
I found that, if
I wonder why. Expect for any discussion for this.
This is actually not clear to us as well. For example, one of the recent papers, Behavior Proximal Policy Optimization, states quite high scores for the behavioral cloning (see this discussion on openreview). Most likely, it boils down to the implementation details (similar to recent CQL discussion), but the authors did not yet release the source code. We will keep an eye on it, hopefully, we will bring those in our codebase as well.
What's really puzzling is that AntMaze is usually considered to be a testbed for "stitching". But with all of these recent results, it looks that it might not be that representative (and one should rather go in favor of random datasets).
@lucasliunju, as for the original issue, we did not tune the hyperparameters for td3+bc and used the ones provided in the original paper. As noted by @Jinyi6 more tuning may help.
In case anyone already have these hyperparams tuned, you can share them with us, we will run them and add new results and wandb runs to the repo.
@Jinyi6 @vkurenkov thanks so much for your help! I will try to tune the hyperparameters and report my result.
Best,
Lucas
@lucasliunju any updates?
Closing for now. Feel free to re-open if there are any further questions.