facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use pearl with custom pytorch model

BoccheseGiacomo opened this issue · comments

Used to use pytorch + gym for RL.
Is there a way to make a custom neural net in pytorch and insert it into the pearl agent without using the pre-made settings for neural nets?

It is possible to directly provide a network instance to be used by policy learners implementing Temporal Difference (TD learning) methods:

Moreover, while it is not possible to directly provide network instances to the subclasses of ActorCriticBase (such as DeepDeterministicPolicyGradient, ImplicitQLearning, ProximalPolicyOptimization, REINFORCE, SoftActorCritic, and ContinuousSoftActorCritic), it is possible to use custom user network classes. These policy learners accept actor_network_type and critic_network_type parameters which allow the user to specify the classes to be used for those networks. So the user can define their own custom network classes and have the policy learner use them. They must however be subclasses of ActorNetwork and QValueNetwork classes respectively.

I hope this helps, please let us know if something is unclear.

@rodrigodesalvobraz

Thank you. it's clear.

Another small question: can i add custom properties to my agent? for example if i want my agent to have some properties that are properties of the agent, not of the environment, for example "money". Imagine i'm running a simulation of a market economy environment, and i want each agent to possess some float attributes like money, health, hunger etc.

Can i do this with pearl?
In gym i would simply insert these properties in my "agent" class

Adding a property to an object is always possible in Python in general, regardless of the libraries one might be using. If you have a object instance agent, writing agent.money = 10 will work and be accessible from then on.

However, I suspect this might not be what you would really want. If you want the RL algorithm to take these properties into account when making decisions for the agent (for example, the agent will be more likely to spend money if it has a lot of it, but will try to save when it does not), then what you really want is to include that information in the observations being received by the agent (if you happen to be using an environment, then the environment is a good candidate to keep this information and include it in the observations it provides). This is because RL algorithms will make decisions based on the information contained in observations (states).

It might be a bit odd to think that an agent's property is being included in the observations coming from the outside world since we tend to think of it as an internal property, but one way to think about it is that the agent is observing things, including how much money is in its pocket!

Because Pearl currently represents observations as tensors, you might want to use a representation such that the agent's properties of interest are concatenated with the rest of the observation into a single tensor.

To keep the property up-to-date (for example, to decrease the agent's money once a decision is made to spend some), the code receiving the action from the agent (in learning situations this will typically be the step method of the environment) must modify the property's value according to the action taken.

I hope this helps.

@rodrigodesalvobraz

thank you for the super clear explanation. i will do through the environment in case.
Yes i know about how observations work in RL, but i thought that each agent needs to concatenate its own internal
property to the external observation, and this property is different for every agent.

So i need to make a list/dictionary that saves the internal property for each agent.

thanks.

The solution you mention (keeping the properties inside the agent instance and concatenating them to the external observation) might actually work, too.

It seems to be a little less conventional. Usually in an RL problem one talks about a single type of observation, but here you have two types, the external one and the internal one obtained from concatenating agent's properties to the first one. Software libraries often makes assumptions based on conventions, so going a less conventional route may turn out tricky, but as far as I can tell right now your approach could work too.

thank you again