xuanlinli17 / CS285_Fa19_Deep_Reinforcement_Learning

My solutions to UC Berkeley CS285 (originally CS294-112, deeprlcourse) Fall 2019 assignments

Home Page:https://github.com/xuanlinli17/CS285_Fa19_Deep_Reinforcement_Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproducing the result of hw1 problem 1(b)

Duconnor opened this issue · comments

Hi there! I am trying to reproduce the result of homework 1, problem 1(b). I use the file requirements.txt to install all my dependencies. And when I ran the command:

python cs285/scripts/run_hw1_behavior_cloning.py --expert_policy_file cs285/policies/experts/HalfCheetah.pkl --env_name HalfCheetah-v2 --exp_name test_bc_hcheetah --n_iter 1 --expert_data cs285/expert_data/expert_data_HalfCheetah-v2.pkl --batch_size=1000 --eval_batch_size=5000

what I got:

Loading expert policy from... cs285/policies/experts/HalfCheetah.pkl
obs (1, 17) (1, 17)
Done restoring expert policy...


********** Iteration 0 ************

Training agent using sampled data from replay buffer...

Beginning logging procedure...

Collecting data for eval...
Eval_AverageReturn : 4.991946220397949
Eval_StdReturn : 17.147544860839844
Eval_MaxReturn : 32.29301452636719
Eval_MinReturn : -9.376068115234375
Eval_AverageEpLen : 1000.0
Train_AverageReturn : 4205.7783203125
Train_StdReturn : 83.038818359375
Train_MaxReturn : 4288.81689453125
Train_MinReturn : 4122.7392578125
Train_AverageEpLen : 1000.0
Train_EnvstepsSoFar : 0
TimeSinceStart : 4.198240041732788
Initial_DataCollection_AverageReturn : 4205.7783203125
Done logging...



Saving agent's actor...

So the average return of evaluation is about 4.99, which does not match the result provided in folder ./hw1/run_logs/bc_test_bc_hcheetah_HalfCheetah-v2_16-09-2019_00-58-58/. I was wondering which part I've done wrong and it would be nice if you could help me figure it out. Many thanks!

I think I forgot to push the expert_data folder. Sorry about that!
The result should be

********** Iteration 0 ************
cs285/expert_data/expert_data_HalfCheetah-v2.pkl
envsteps this batch 0

Training agent using sampled data from replay buffer...

Beginning logging procedure...

Collecting data for eval...
Eval_AverageReturn : 2369.33544921875
Eval_StdReturn : 113.10492706298828
Eval_MaxReturn : 2578.70556640625
Eval_MinReturn : 2282.853759765625
Eval_AverageEpLen : 1000.0
Train_AverageReturn : 4205.7783203125
Train_StdReturn : 83.038818359375
Train_MaxReturn : 4288.81689453125
Train_MinReturn : 4122.7392578125
Train_AverageEpLen : 1000.0
Train_EnvstepsSoFar : 0
TimeSinceStart : 3.9584085941314697
Initial_DataCollection_AverageReturn : 4205.7783203125
Done logging...

@xuanlinli17 Thank you for your reply! I finally figure it out why my previous result is wrong. It seems that after I install the cs285 package via the file setup.py, even if I change to another directory and trying to run your code, it still executes my own code. I don't know why it has this behavior, but I got everything works correctly by creating a new Conda environment and now I can get the correct result! Thanks again!