AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gym-anytrading + DI-engine

PaParaZz1 opened this issue · comments

commented

Nice project! I am the developer of DI-engine. We are looking for some RL environments for RL + trading, and we find your repo is suitable to make a demo for beginngers.

In last version updates, we modify gym-anytrading env and adapt DI-engine to it, here are our modification and experiment results (StockEnv + DQN). We also hope to add an example in gym-anytrading repo. What do you think about this idea, or any other thoughts?

Hi @PaParaZz1, your work seems pretty interesting. People asked me many times to add some mid-level features. The gym-anytrading is a low-level and simple repo for beginners, while my gym-mtsim project is a high-level and complex tool for experts. Your project stays just in between and provides the requested mid-level features in this regard. You somehow implemented a more practical env on top of this one by improving the reward function. I didn't have enough time to read your work thoroughly, but I think it is valuable and I can put a link to your project in the README.md file.

About the example, I'm not sure if your example works here because you made some modifications that do not match my simple implementation. Can you explain more about it?

commented

Thank you for acknowledging our work. I will explain our modifications more clearly as follows:

Defects in original environment

For the original environment, the state machine can be described as such a state machine:
init_state

And the reward function is:
ori_rew

If and only when the position changes from long to short, a profit occurs. We can find that the reward of “Buy” is always zero whatever the position is. For Q-learning algorithm, it’s hard to estimate the Q value of action “Buy”.
Actually, the reward function is relied on “position”, “action” as well as “last trade tick”. So, it’s reasonable to add these features into original state of environment. Otherwise, the reward function is unstable to the agent.
The final state formula in our baseline is:
state

Besides, the agent can not make profits by “selling short”.

Our Modifications

  1. add some features in state

    • add “position” and “volume”
    • add feature “last trade tick”, which can be represented as (curr_tick - last_trade_tick)/eps_length to record the time of last valid transaction.
  2. change the original operation logic of environment so that agent is able to make profits by using diverse strategies

    • change the state machine so that we can make profits by selling short
      s1

    • add action “Double_Sell” and “Double_Buy” so that the position between “Long” and “Short” can be transformed at one trading day.
      s2

  3. modify reward function

  4. modify DQN algorithm hyper-parameters

    • n-step DQN is better than 1-step to this case. We set n = 3.
    • In DQN algorithm, the signal “done” is ignored when updating Q function. We find it’s effective because the Q value of the last day at one episode should not be 0.

P.S. here is the readme about our modifactions.

commented

Do you have any other comments or ideas?

Sorry for my late response, I was a bit busy. Interesting work! I will add a link to your repo soon.

commented

Thank you! Looking forward to bringing more interesting work to the open-source community.

Just added a link to your repo: https://github.com/AminHP/gym-anytrading#related-projects.
Keep up the good work! 🚀