end of training causes noise
GXY2017 opened this issue · comments
The question is about env._calculate_reward(self, action), I think there should be some way to handle the end of each training.
Assume we choose 5000 HLOC bars every time in training. The agent sells at bar[-5] and there is no buy action to calculate the ending reward. Then, we have to assign a reward to it manually. This will inevitably alter the training result.
At the moment, I set the ending reward to 0.
if self._current_tick == self._end_tick:
step_reward = 0
But setting it to 0 still gives the wrong info to the agent. The different values assigned can cause huge swings in the final result.
To avoid this problem, is there any way we can set the transaction numbers instead of the length of bars? Thank you.
Update
I found a solution, promising to work with but still suffer from information loss. I add this to trading_env.
length_to_end = len(self.prices[self._last_trade_tick:self._end_tick]) >= self.window_size
and altered this following part respectively.
if length_to_end:
self._last_action = action
self._position = self._position.opposite()
self._last_trade_tick = self._current_tick
else:
self._last_action = 0
self._position = None
self._last_trade_tick = self._current_tick
# There are more to change, the rest is not shown here.
The major drawback of this method is I cannot apply it in the validation process by dropping the last several bars. And of course, I will need the last bars in real trading.
Is there any more efficient and accurate solution?
Hi @GXY2017,
I used self._done
in this line of code in order to calculate the profit at the terminal point. Perhaps it can be used for reward calculation. Please check out the below code and let me know if it works for you (I didn't test it):
def _calculate_reward(self, action):
step_reward = 0 # pip
trade = False
if ((action == Actions.Buy.value and self._position == Positions.Short) or
(action == Actions.Sell.value and self._position == Positions.Long)):
trade = True
if trade or self._done:
current_price = self.prices[self._current_tick]
last_trade_price = self.prices[self._last_trade_tick]
price_diff = current_price - last_trade_price
if self._position == Positions.Short:
step_reward += -price_diff * 10000
elif self._position == Positions.Long:
step_reward += price_diff * 10000
return step_reward
It works. Thanks.