tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC

Home Page:https://arxiv.org/abs/2210.07105

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The results about td3_bc on Antmaze

lucasliunju opened this issue · comments

Hi

May I ask the setting about td3_bc on antmaze. I find current hyperparameters can not work well and obtain a similar result as in the paper.

Best

Hi,

Can you share the details of the experiment you're trying to reproduce? Dataset and the exact configuration file you work with

Thanks for your reply.

I try to run algorithms/td3_bc.py with the config file configs/td3bc/antmaze/medium_play_v0.yaml

I find the final reward is 0, which is lower than the results that are reported in the paper IQL (10.6) and TD3BC(3.0)

Best

I found that, if $\alpha$ is set larger (e.g., $\alpha=36$), then TD3+BC works well in Antmaze. TD3+BC gets return of 38 in antmaze_medium_play dataset.

I wonder why. Expect for any discussion for this.

This is actually not clear to us as well. For example, one of the recent papers, Behavior Proximal Policy Optimization, states quite high scores for the behavioral cloning (see this discussion on openreview). Most likely, it boils down to the implementation details (similar to recent CQL discussion), but the authors did not yet release the source code. We will keep an eye on it, hopefully, we will bring those in our codebase as well.

What's really puzzling is that AntMaze is usually considered to be a testbed for "stitching". But with all of these recent results, it looks that it might not be that representative (and one should rather go in favor of random datasets).

@lucasliunju, as for the original issue, we did not tune the hyperparameters for td3+bc and used the ones provided in the original paper. As noted by @Jinyi6 more tuning may help.

In case anyone already have these hyperparams tuned, you can share them with us, we will run them and add new results and wandb runs to the repo.

@Jinyi6 @vkurenkov thanks so much for your help! I will try to tune the hyperparameters and report my result.

Best,
Lucas

Closing for now. Feel free to re-open if there are any further questions.