The results about td3_bc on Antmaze

Question

The results about td3_bc on Antmaze

lucasliunju opened this issue 2 years ago · comments

Hi

May I ask the setting about td3_bc on antmaze. I find current hyperparameters can not work well and obtain a similar result as in the paper.

Best

Vladislav Kurenkov · Answer 1 · Thu Nov 24 2022 08:17:20 GMT+0800 (China Standard Time)

Hi,

Can you share the details of the experiment you're trying to reproduce? Dataset and the exact configuration file you work with

lucasliunju · Answer 2 · Thu Nov 24 2022 10:27:49 GMT+0800 (China Standard Time)

Thanks for your reply.

I try to run algorithms/td3_bc.py with the config file configs/td3bc/antmaze/medium_play_v0.yaml

I find the final reward is 0, which is lower than the results that are reported in the paper IQL (10.6) and TD3BC(3.0)

Best

Jinyi6 · Answer 3 · Tue Nov 29 2022 16:41:02 GMT+0800 (China Standard Time)

I found that, if $\alpha$ is set larger (e.g., $\alpha=36$), then TD3+BC works well in Antmaze. TD3+BC gets return of 38 in antmaze_medium_play dataset.

I wonder why. Expect for any discussion for this.

Vladislav Kurenkov · Answer 4 · Wed Nov 30 2022 03:56:01 GMT+0800 (China Standard Time)

This is actually not clear to us as well. For example, one of the recent papers, Behavior Proximal Policy Optimization, states quite high scores for the behavioral cloning (see this discussion on openreview). Most likely, it boils down to the implementation details (similar to recent CQL discussion), but the authors did not yet release the source code. We will keep an eye on it, hopefully, we will bring those in our codebase as well.

What's really puzzling is that AntMaze is usually considered to be a testbed for "stitching". But with all of these recent results, it looks that it might not be that representative (and one should rather go in favor of random datasets).

Vladislav Kurenkov · Answer 5 · Wed Nov 30 2022 04:00:38 GMT+0800 (China Standard Time)

@lucasliunju, as for the original issue, we did not tune the hyperparameters for td3+bc and used the ones provided in the original paper. As noted by @Jinyi6 more tuning may help.

In case anyone already have these hyperparams tuned, you can share them with us, we will run them and add new results and wandb runs to the repo.

lucasliunju · Answer 6 · Sun Dec 04 2022 15:36:34 GMT+0800 (China Standard Time)

@Jinyi6 @vkurenkov thanks so much for your help! I will try to tune the hyperparameters and report my result.

Best,
Lucas

Vladislav Kurenkov · Answer 7 · Thu Feb 23 2023 04:57:37 GMT+0800 (China Standard Time)

@lucasliunju any updates?

Vladislav Kurenkov · Answer 8 · Wed May 10 2023 08:41:11 GMT+0800 (China Standard Time)

Closing for now. Feel free to re-open if there are any further questions.