gym-anytrading + DI-engine

Question

gym-anytrading + DI-engine

PaParaZz1 opened this issue 2 years ago · comments

Nice project! I am the developer of DI-engine. We are looking for some RL environments for RL + trading, and we find your repo is suitable to make a demo for beginngers.

In last version updates, we modify gym-anytrading env and adapt DI-engine to it, here are our modification and experiment results (StockEnv + DQN). We also hope to add an example in gym-anytrading repo. What do you think about this idea, or any other thoughts?

Mohammad Amin Haghpanah · Answer 1 · Thu Aug 18 2022 07:56:46 GMT+0800 (China Standard Time)

Hi @PaParaZz1, your work seems pretty interesting. People asked me many times to add some mid-level features. The gym-anytrading is a low-level and simple repo for beginners, while my gym-mtsim project is a high-level and complex tool for experts. Your project stays just in between and provides the requested mid-level features in this regard. You somehow implemented a more practical env on top of this one by improving the reward function. I didn't have enough time to read your work thoroughly, but I think it is valuable and I can put a link to your project in the README.md file.

About the example, I'm not sure if your example works here because you made some modifications that do not match my simple implementation. Can you explain more about it?

Swain · Answer 2 · Thu Aug 25 2022 17:00:03 GMT+0800 (China Standard Time)

Thank you for acknowledging our work. I will explain our modifications more clearly as follows:

Defects in original environment

For the original environment, the state machine can be described as such a state machine:

And the reward function is:

If and only when the position changes from long to short, a profit occurs. We can find that the reward of “Buy” is always zero whatever the position is. For Q-learning algorithm, it’s hard to estimate the Q value of action “Buy”.
Actually, the reward function is relied on “position”, “action” as well as “last trade tick”. So, it’s reasonable to add these features into original state of environment. Otherwise, the reward function is unstable to the agent.
The final state formula in our baseline is:

Besides, the agent can not make profits by “selling short”.

Our Modifications

add some features in state
- add “position” and “volume”
- add feature “last trade tick”, which can be represented as (curr_tick - last_trade_tick)/eps_length to record the time of last valid transaction.
change the original operation logic of environment so that agent is able to make profits by using diverse strategies
- change the state machine so that we can make profits by selling short
- add action “Double_Sell” and “Double_Buy” so that the position between “Long” and “Short” can be transformed at one trading day.
modify reward function
modify DQN algorithm hyper-parameters
- n-step DQN is better than 1-step to this case. We set n = 3.
- In DQN algorithm, the signal “done” is ignored when updating Q function. We find it’s effective because the Q value of the last day at one episode should not be 0.

P.S. here is the readme about our modifactions.

Swain · Answer 3 · Mon Sep 05 2022 19:28:51 GMT+0800 (China Standard Time)

Do you have any other comments or ideas?

Mohammad Amin Haghpanah · Answer 4 · Tue Sep 06 2022 05:42:53 GMT+0800 (China Standard Time)

Sorry for my late response, I was a bit busy. Interesting work! I will add a link to your repo soon.

Swain · Answer 5 · Thu Sep 08 2022 00:24:01 GMT+0800 (China Standard Time)

Thank you! Looking forward to bringing more interesting work to the open-source community.

Mohammad Amin Haghpanah · Answer 6 · Mon Sep 12 2022 04:52:44 GMT+0800 (China Standard Time)

Just added a link to your repo: https://github.com/AminHP/gym-anytrading#related-projects.
Keep up the good work! 🚀