fvalka / atc-reinforcement-learning

Reinforcement learning for an air traffic control task. OpenAI gym based simulation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to eliminate "invalid actions"

epaulz-vt opened this issue · comments

Hello,

Not sure if this repo is active, but I am interested in using your environment for a research project. I have built my own simple DeepQ network to train on the ATC environment. I got it working except I get the messages "Warning invalid action: 400 for index: 0" and "Warning invalid action: 57000 for index: 1" often and I can't figure out how to resolve this.

It seems as though my agent is staying near the initial starting point, and will only move up/down/left, but will not move to the right. It does not seem to be learning past a certain point, and I wonder if this is caused by using these "invalid actions"?

Any assistance would be much appreciated.

Eric

Hello Eric,

not very active. But I still am.

Sounds like you're trying to perform actions which are outside of the action space.

If you are using the continous action space with normalization (the default) everything should be normalized to between -1 and 1

see the action space definition here:

self.action_space = gym.spaces.Box(low=np.array([-1, -1, -1]),
high=np.array([1, 1, 1]))

Hope that helped.

All the best
Fabian

Thank you for your response. I have managed to move past the invalid action issue. However, I am having a hard time understanding how to properly interact with the action space of this environment from my custom DeepQ network... let me explain.

When training on an environment like CartPole or LunarLander, the "action space" is a set of scalar values (say 0-4), one of which is selected and then gets interpreted and perhaps translated by the environment in some way. When I use that approach here, it seems that each "action" is a tuple of 3 separate actions (v,h,phi). When I try to choose a scalar action, I get an error because the environment expects to be able to index my action. However, my attempts to modify my model to select and store actions in tuples does not seem to be working.

Do you perhaps have any examples of training a model other than those from 'baselines' so that I could get a better idea of how to interact with this environment? I am very interested in getting this working.

I suppose a simpler way to explain my dilemma is that I don't quite understand how to interact with the continuous action space (I am still fairly new to machine learning). I see that there seems to be a way to switch the environment to a discrete action space. However, no matter which mode it's in when I attempt to understand the action space with "num_outputs = env.action_space.n" it keeps telling me that 'Box' and 'MultiDiscrete' don't have an 'n' attribute.