AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

end of training causes noise

GXY2017 opened this issue · comments

The question is about env._calculate_reward(self, action), I think there should be some way to handle the end of each training.

Assume we choose 5000 HLOC bars every time in training. The agent sells at bar[-5] and there is no buy action to calculate the ending reward. Then, we have to assign a reward to it manually. This will inevitably alter the training result.

At the moment, I set the ending reward to 0.

        if self._current_tick == self._end_tick:
            step_reward = 0

But setting it to 0 still gives the wrong info to the agent. The different values assigned can cause huge swings in the final result.
To avoid this problem, is there any way we can set the transaction numbers instead of the length of bars? Thank you.

Update
I found a solution, promising to work with but still suffer from information loss. I add this to trading_env.

length_to_end = len(self.prices[self._last_trade_tick:self._end_tick]) >= self.window_size

and altered this following part respectively.

            if length_to_end:
                self._last_action = action 
                self._position = self._position.opposite()
                self._last_trade_tick = self._current_tick
            else:
                self._last_action = 0  
                self._position = None
                self._last_trade_tick = self._current_tick

# There are more to change,  the rest is not shown here.

The major drawback of this method is I cannot apply it in the validation process by dropping the last several bars. And of course, I will need the last bars in real trading.
Is there any more efficient and accurate solution?

Hi @GXY2017,

I used self._done in this line of code in order to calculate the profit at the terminal point. Perhaps it can be used for reward calculation. Please check out the below code and let me know if it works for you (I didn't test it):

def _calculate_reward(self, action):
    step_reward = 0  # pip

    trade = False
    if ((action == Actions.Buy.value and self._position == Positions.Short) or
        (action == Actions.Sell.value and self._position == Positions.Long)):
        trade = True

    if trade or self._done:
        current_price = self.prices[self._current_tick]
        last_trade_price = self.prices[self._last_trade_tick]
        price_diff = current_price - last_trade_price

        if self._position == Positions.Short:
            step_reward += -price_diff * 10000
        elif self._position == Positions.Long:
            step_reward += price_diff * 10000

    return step_reward

It works. Thanks.