SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

Home Page:https://genrl.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

VPG not working after #294

Devanshu24 opened this issue · comments

After the refactoring that was done, I thought changing the tutorial accordingly to this

import gym  # OpenAI Gym

from genrl.agents import VPG
from genrl.trainers import OnPolicyTrainer
from genrl.environments import VectorEnv

env = VectorEnv("CartPole-v1")
agent = VPG('mlp', env)
trainer = OnPolicyTrainer(agent, env, epochs=200)
trainer.train()

Upon running this code I get the following error

Traceback (most recent call last):
  File "/home/think__tech/Desktop/genrl/temp.py", line 10, in <module>
    trainer.train()
  File "/home/think__tech/Desktop/genrl/genrl/trainers/onpolicy.py", line 46, in train
    self.agent.get_traj_loss(values, done)
  File "/home/think__tech/Desktop/genrl/genrl/agents/deep/vpg/vpg.py", line 114, in get_traj_loss
    self.rollout.compute_returns_and_advantage(values.detach().cpu().numpy(), dones)
  File "/home/think__tech/Desktop/genrl/genrl/core/rollout_storage.py", line 233, in compute_returns_and_advantage
    last_return = self.rewards[step] + next_non_terminal * next_value
TypeError: mul(): argument 'other' (position 1) must be Tensor, not numpy.ndarray

The reason I found was this

self.rollout.compute_returns_and_advantage(values.detach().cpu().numpy(), dones)

Here we are passing the values as np array
But here
def compute_returns_and_advantage(

It expects a Torch tensor

Hence changing values.detach().cpu().numpy() to values does solve the problem

I don't, however, know where is torch being explicitly used in this (compute_returns_and_advantage) function

Also on a side note, I was just checking the CI file for VPG, it has this very function still the CI did pass on that merge
Why is that the case?

UPD: Come to think of it, something could be wrong with my local machine? I did reinstall the library though

After the refactoring that was done, I thought changing the tutorial accordingly to this

import gym  # OpenAI Gym

from genrl.agents import VPG
from genrl.trainers import OnPolicyTrainer
from genrl.environments import VectorEnv

env = VectorEnv("CartPole-v1")
agent = VPG('mlp', env)
trainer = OnPolicyTrainer(agent, env, epochs=200)
trainer.train()

This runs for me, gives the following expected output:

timestep         Epoch            loss             mean_reward      
0                0                7.5798           19.50480079650879          
10240            10               10.2152          25.315200805664062          
20480            20               12.8218          34.076499938964844          
30720            30               15.2794          44.61869812011719          
40960            40               18.8412          59.5349006652832          
51200            50               21.3708          74.2029037475586          
61440            60               24.8012          97.06159973144531          
71680            70               26.8724          114.41339874267578         
81920            80               30.5441          144.22540283203125         
92160            90               32.3745          166.5041046142578         
102400           100              34.4109          186.18179321289062 
112640           110              35.6278          206.8686981201172         
122880           120              36.009           222.60870361328125         
133120           130              35.3222          220.21499633789062          
143360           140              36.9469          249.756103515625         
153600           150              36.5153          240.94119262695312         
163840           160              38.5735          292.5714111328125         
174080           170              36.6223          265.9739990234375          
184320           180              39.6573          330.3226013183594         
194560           190              39.0522          325.07940673828125  

UPD: Come to think of it, something could be wrong with my local machine? I did reinstall the library though

Are you installing from source?

UPD: Come to think of it, something could be wrong with my local machine? I did reinstall the library though

Are you installing from source?

Yes, to be precise this

sudo python setup.py install
sudo python setup.py install

After this, do you run python or sudo python?

Also just to careful can you try python3/sudo python3?

Steps:

  1. python setup.py install does not work apparently insufficient permissions
  2. sudo python setup.py install installs the dependencies and things
  3. python temp.py or sudo python temp.py or python3 temp.py or sudo python3 temp.py
  4. I get the error

Just python temp.py used to work before
OS: Ubuntu 18.04

UPD: I don't think it would, but is a reboot necessary?

Can you try running this and check if the path of the library is where your source installation is?

import genrl
genrl.__file__

Can you try running this and check if the path of the library is where your source installation is?

import genrl
genrl.__file__

Yes it is where the git repo is (If that's what you mean by source installation, couldn't understand clearly)

def compute_returns_and_advantage(

It expects a Torch tensor

Hence changing values.detach().cpu().numpy() to values does solve the problem

This is still a bug though, feel free to raise a PR for this.

I might be doing some very silly mistake too, so if you're certain that the code is fine we can close this issue :)

It should be values.detach().clone() just to be careful though.

Once you make the change can you try training it?

def compute_returns_and_advantage(

It expects a Torch tensor
Hence changing values.detach().cpu().numpy() to values does solve the problem

This is still a bug though, feel free to raise a PR for this.

Also I actually tested after making this change and it does work

I might be doing some very silly mistake too, so if you're certain that the code is fine we can close this issue :)

Regardless of the fact that I'm able to reproduce or not, compute_returns function should not receive a numpy array after #294

So feel free to raise a PR.

It should be values.detach().clone() just to be careful though.

Once you make the change can you try training it?

Yup works after this too

Just out of curiosity any reason it passed the CI and worked for you but not for me?

Not sure. You could try and debug that though :)

Wanted to give an update
Earlier I was using torch 1.3.0
So I manually updated to the latest 1.6.0 and now it works (even the old code with i.e. values.detach().cpu().numpy())
I was operating under the assumption that doing python setup.py install makes sure the correct versions are present, is that not true or was it a glitch

requirements.txt does specify 1.4.0

requirements.txt does specify 1.4.0

Exactly, so when I did python setup.py install it should have checked/updated that right? Or does that not happen?