Use pearl with custom pytorch model

Question

Use pearl with custom pytorch model

BoccheseGiacomo opened this issue 4 months ago · comments

Used to use pytorch + gym for RL.
Is there a way to make a custom neural net in pytorch and insert it into the pearl agent without using the pre-made settings for neural nets?

rodrigodesalvobraz · Answer 1 · Wed Feb 07 2024 14:28:56 GMT+0800 (China Standard Time)

It is possible to directly provide a network instance to be used by policy learners implementing Temporal Difference (TD learning) methods:

All direct and indirect subclasses of DeepTDLearning (DeepQLearning, DeepSARSA, DoubleDQN, among others), since DeepTDLearning accepts an initialization parameter network_instance, which must be an instance of QValueNetwork.
While network_instance does not explicitly appear in the documentation of the __init__ methods of the subclasses, the parameter can be passed to them as a keyword argument (kwargs) and will be used by their super class.
Here's an example of network_instance being used for DeepQLearning.
All direct and indirect classes of QuantileRegressionDeepTDLearning (currently, this is only QuantileRegressionDeepQLearning).
This class also accepts a network_instance parameter, but this time it must be an instance of QuantileQValueNetwork.
There is currently no example of this in the code base because the test for QuantileRegressionDeepQLearning uses the default network type, but its use is analogous to the DeepQLearning example linked above.

Moreover, while it is not possible to directly provide network instances to the subclasses of ActorCriticBase (such as DeepDeterministicPolicyGradient, ImplicitQLearning, ProximalPolicyOptimization, REINFORCE, SoftActorCritic, and ContinuousSoftActorCritic), it is possible to use custom user network classes. These policy learners accept actor_network_type and critic_network_type parameters which allow the user to specify the classes to be used for those networks. So the user can define their own custom network classes and have the policy learner use them. They must however be subclasses of ActorNetwork and QValueNetwork classes respectively.

I hope this helps, please let us know if something is unclear.

BoccheseGiacomo · Answer 2 · Wed Feb 07 2024 18:01:04 GMT+0800 (China Standard Time)

@rodrigodesalvobraz

Thank you. it's clear.

Another small question: can i add custom properties to my agent? for example if i want my agent to have some properties that are properties of the agent, not of the environment, for example "money". Imagine i'm running a simulation of a market economy environment, and i want each agent to possess some float attributes like money, health, hunger etc.

Can i do this with pearl?
In gym i would simply insert these properties in my "agent" class

rodrigodesalvobraz · Answer 3 · Thu Feb 08 2024 01:34:43 GMT+0800 (China Standard Time)

Adding a property to an object is always possible in Python in general, regardless of the libraries one might be using. If you have a object instance agent, writing agent.money = 10 will work and be accessible from then on.

However, I suspect this might not be what you would really want. If you want the RL algorithm to take these properties into account when making decisions for the agent (for example, the agent will be more likely to spend money if it has a lot of it, but will try to save when it does not), then what you really want is to include that information in the observations being received by the agent (if you happen to be using an environment, then the environment is a good candidate to keep this information and include it in the observations it provides). This is because RL algorithms will make decisions based on the information contained in observations (states).

It might be a bit odd to think that an agent's property is being included in the observations coming from the outside world since we tend to think of it as an internal property, but one way to think about it is that the agent is observing things, including how much money is in its pocket!

Because Pearl currently represents observations as tensors, you might want to use a representation such that the agent's properties of interest are concatenated with the rest of the observation into a single tensor.

To keep the property up-to-date (for example, to decrease the agent's money once a decision is made to spend some), the code receiving the action from the agent (in learning situations this will typically be the step method of the environment) must modify the property's value according to the action taken.

I hope this helps.

BoccheseGiacomo · Answer 4 · Thu Feb 08 2024 02:02:16 GMT+0800 (China Standard Time)

@rodrigodesalvobraz

thank you for the super clear explanation. i will do through the environment in case.
Yes i know about how observations work in RL, but i thought that each agent needs to concatenate its own internal
property to the external observation, and this property is different for every agent.

So i need to make a list/dictionary that saves the internal property for each agent.

thanks.

rodrigodesalvobraz · Answer 5 · Thu Feb 08 2024 02:09:02 GMT+0800 (China Standard Time)

The solution you mention (keeping the properties inside the agent instance and concatenating them to the external observation) might actually work, too.

It seems to be a little less conventional. Usually in an RL problem one talks about a single type of observation, but here you have two types, the external one and the internal one obtained from concatenating agent's properties to the first one. Software libraries often makes assumptions based on conventions, so going a less conventional route may turn out tricky, but as far as I can tell right now your approach could work too.

BoccheseGiacomo · Answer 6 · Thu Feb 08 2024 17:33:48 GMT+0800 (China Standard Time)

thank you again