git clone https://github.com/IrisLi17/value-difference-model
cd value-difference-model
conda create -n <your_name> python=3.5
conda activate <your_name>
pip install -r requirements.txt
python run_model_based_rl.py trpo -env <env_name>
<env_name>
must be one of half-cheetah
, swimmer
, snake
, ant
, humanoid
.
half-cheetah
, swimmer
, snake
take hours to converge. ant
takes ~3 days to converge and suffers from segment fault time to time on my machine.
The logging folder is saved in data/local/<env_name>/<env_name>_DATETIME_0001
by default.
progress.csv
contains real_current_validation_cost
which is the negative of the reward so far.
You can use tensorboard
to monitor more intermediate result by:
tensorboard --logdir <tf_logging_dir> --port <port_number>
Also, you will need to set up ssh port forwarding to see tensorboard on your local machine.
To switch between original dynamic loss definition and the two proposed losses, modify sandbox/thanard/me-trpo/params/params-<env>.json
,
dynamics_opt_params/use_value
and dynamics_opt_params/dvds_weighting
are the most relevant.
original loss: use_value
=False, dvds_weighting
=False.
use_value
=False, dvds_weighting
=True.
use_value
=True, dvds_weighting
=False.
Currently it cannot run on our server.
TODO: you will need to manually modify line 612 in model_based_rl.py
to specify the path of saved model. See my comment there.
Afterwards, run:
python run_model_based_rl.py trpo -env <env_name> -perform