wassname / rl-portfolio-management

Attempting to replicate "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem" https://arxiv.org/abs/1706.10059 (and an openai gym environment)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Network Topology

ZhengyaoJiang opened this issue · comments

Hi, quite happy that our work is replicated!

One problem I found is the topology.
In tensorforce-VPG.ipynb In [11].
It seems that a dense layer is added to the network, which is different in original work.

x = dense(x, size=env.action_space.shape[0],activation='relu', l2_regularization=1e-8)

The "Ensemble of Identical Independent Evaluators" will not include any dense layer. Outputs of last convolutional layer will be fed into softmax function directly. That's why we say they are "independent".

Hi, good to hear from an author of that paper!

That's one part of the paper I was unsure about. After looking at figure 2, I wasn't sure if the cash bias introduced by a head layer, was hard coded, or a single neuron? I guess it was hard coded?

Yes, it is a constant in our work.
However, it hasn't been tested if use a variable could be better.

Ah, good to know, that makes sense. That way you can set it to modulate how risky you want the model to be when applying.

Don't hesitate to point out anything else you notice, it's interesting.

@ZhengyaoJiang First of all, I am also a fan of your paper and very interested with the activity in this repository. Can you please share your exact dataset so that results here can be comparable with your publication? otherwise, I guess it would be difficult to evaluate if the implementation is correct.

@wassname I am not knowledgeable yet in Deep Learning but can you please add me over Skype (ID: akaniklaus) so that we can see if I can also contribute to the project somehow. Thank you!!!

@akaniklaus Our data is stored in the database. To take advantage of the data, the data processing code should also be shared.
Actually, the community version of the code would be released in several months. It's easy to test the implementation at that time.

Besides, there is no guarantee that our data pre-processing is bug-free. I think It's great to also replicate the data accessing part to double check our results.

@akaniklaus maybe we can skype next week, I like to be a digital "hermit" in the weekend to unwind. For now check out #2 and #4 for ideas on how to contribute.

Personally I am researching reinforcement algorithms that will converge even with noisy observations, since many of the test environments differ from trading market in that their observations are not very noisy. I found the rainbow paper interesting, since they managed to combine the latest RL tricks into one agent that converges ~4x faster. You might like to have a read if it's not to advanced for you.

I would like to add that there is also no guarantee my code is bug free :), let's be honest it probably has some bugs in! So if you notice anything please point it out especially in the environment code.

@ZhengyaoJiang it will be good to see your implementation! Can I ask you, did it converge on most runs, or did you have to try a few times to get it to converge? I ask because RL is notoriously finicky at the moment.

@wassname Yes, it can converge on most runs.

@ZhengyaoJiang Have you ever tried your method with hourly data? in my experience OLMAR and RMR performing even better in terms of returns with it. I can also point out that the set of selected tokens affects their performance dramatically. I don't know if that would also be valid for yours.

@wassname I will read and might understand the paper but as I said, I don't have enough experience yet with Deep Learning to help you implementing that. Have you checked the following repository:
https://github.com/Kaixhin/Rainbow

@akaniklaus

Have you ever tried your method with hourly data? in my experience OLMAR and RMR performing even better in terms of returns with it.

It seems worth to try. Did you take commission fee into consideration?

I can also point out that the set of selected tokens affects their performance dramatically

I suppose you mean the selection of assets? This process is done by automatically select top-volume assets at the end of training data. Select by hand might introduce survivorship bias.

And guys, this issues has been out of topic. If you want further communication, it's a good choice to e-mail me or use hangouts.
my Gmail: jzyjiangzhengyao@gmail.com

@ZhengyaoJiang Yes, I did take commission fee into consideration. One disturbing things about OLPS was that portfolios have very high entropy (meaning that it bets on only a few tokens (often one or two token) in each round and shifts completely.

I found this a bit disturbing and tried to reduce this behavior by using the lowest epsilon value or having a Kalman filter, but they didn't result better in backtests. I am curious if yours also make a similar behavior while distributing the wealth into tokens.

I am not talking about picking best performing ones in training data, that would certainly cause a bias. However, I would select assets based on Expert-knowledge in a real-world use, as that's also what people would normally do when buying & holding.

Ok, thank you very much. I will make further communication via your Gmail. Have a nice weekend.