facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mixed Discrete + Continuous Action Spaces

kuza55 opened this issue · comments

Hi,

I was wondering if there was an easy way to use Pearl with an environment that had both a discrete and continuous action space at the same time.

  • Alex

Hi there, unfortunately, that is not supported at the moment, but would you please help explain what your environment/problem is so that we can understand better?

I don't want to get too in the weeds about my problem since the formulation will probably change over time, but I am working on a scheduler where I want to assign tasks to machines and also turn the machines off with a somewhat annoying cost model of how the machines are paid for.

So I want to model the assignment of tasks to machines as a discrete action space, and then I would like to model a time at which to turn the machines off as a continuous action space to avoid needing to call the model repeatedly to find out if it should turn off when nothing has been submitted, and to avoid discretizing the continuous time action in the hopes of getting better learning behaviour.

I think that the main issue here is to be able to support dictionary action spaces. Therefore, the sub-action spaces of the keys of that dictionary would combine BoxActionSpaces (continuous) and MultiDiscreteActionSpaces (discrete) for example.

Sorry for getting back a bit late. We just came back from holidays.

If I were to understand correct, you want to tell the machine to turn of at a certain time, which involves both discrete and continuous action space. However, to formulate your problem as an RL problem, how are discrete time steps defined, i.e., how often do you make decisions?

I think it is not easy to do that with Pearl currently (I expect it will get there in the future).

I wonder if you can work around that by having two agents, one that assigns tasks and the other choosing turnoff times. The two agents would have no knowledge of each other; from each agent's point of view, the changes performed by the other agent look like just environment changes following their last action. You could run them (one turn each) every time new tasks are submitted.