Network Topology

Question

Network Topology

ZhengyaoJiang opened this issue 7 years ago · comments

Hi, quite happy that our work is replicated!

One problem I found is the topology.
In tensorforce-VPG.ipynb In [11].
It seems that a dense layer is added to the network, which is different in original work.

x = dense(x, size=env.action_space.shape[0],activation='relu', l2_regularization=1e-8)

The "Ensemble of Identical Independent Evaluators" will not include any dense layer. Outputs of last convolutional layer will be fed into softmax function directly. That's why we say they are "independent".

Michael J Clark · Answer 1 · Tue Sep 19 2017 21:38:16 GMT+0800 (China Standard Time)

Hi, good to hear from an author of that paper!

That's one part of the paper I was unsure about. After looking at figure 2, I wasn't sure if the cash bias introduced by a head layer, was hard coded, or a single neuron? I guess it was hard coded?

Zhengyao Jiang · Answer 2 · Wed Sep 20 2017 20:14:41 GMT+0800 (China Standard Time)

Yes, it is a constant in our work.
However, it hasn't been tested if use a variable could be better.

Michael J Clark · Answer 3 · Wed Sep 20 2017 21:43:14 GMT+0800 (China Standard Time)

Ah, good to know, that makes sense. That way you can set it to modulate how risky you want the model to be when applying.

Don't hesitate to point out anything else you notice, it's interesting.

akaniklaus · Answer 4 · Fri Oct 13 2017 21:58:17 GMT+0800 (China Standard Time)

@ZhengyaoJiang First of all, I am also a fan of your paper and very interested with the activity in this repository. Can you please share your exact dataset so that results here can be comparable with your publication? otherwise, I guess it would be difficult to evaluate if the implementation is correct.

@wassname I am not knowledgeable yet in Deep Learning but can you please add me over Skype (ID: akaniklaus) so that we can see if I can also contribute to the project somehow. Thank you!!!

Zhengyao Jiang · Answer 5 · Fri Oct 13 2017 22:35:27 GMT+0800 (China Standard Time)

@akaniklaus Our data is stored in the database. To take advantage of the data, the data processing code should also be shared.
Actually, the community version of the code would be released in several months. It's easy to test the implementation at that time.

Besides, there is no guarantee that our data pre-processing is bug-free. I think It's great to also replicate the data accessing part to double check our results.

Michael J Clark · Answer 6 · Sat Oct 14 2017 06:54:10 GMT+0800 (China Standard Time)

@akaniklaus maybe we can skype next week, I like to be a digital "hermit" in the weekend to unwind. For now check out #2 and #4 for ideas on how to contribute.

Personally I am researching reinforcement algorithms that will converge even with noisy observations, since many of the test environments differ from trading market in that their observations are not very noisy. I found the rainbow paper interesting, since they managed to combine the latest RL tricks into one agent that converges ~4x faster. You might like to have a read if it's not to advanced for you.

I would like to add that there is also no guarantee my code is bug free :), let's be honest it probably has some bugs in! So if you notice anything please point it out especially in the environment code.

@ZhengyaoJiang it will be good to see your implementation! Can I ask you, did it converge on most runs, or did you have to try a few times to get it to converge? I ask because RL is notoriously finicky at the moment.

Zhengyao Jiang · Answer 7 · Sun Oct 15 2017 16:01:41 GMT+0800 (China Standard Time)

@wassname Yes, it can converge on most runs.

akaniklaus · Answer 8 · Fri Oct 20 2017 21:30:58 GMT+0800 (China Standard Time)

@ZhengyaoJiang Have you ever tried your method with hourly data? in my experience OLMAR and RMR performing even better in terms of returns with it. I can also point out that the set of selected tokens affects their performance dramatically. I don't know if that would also be valid for yours.

@wassname I will read and might understand the paper but as I said, I don't have enough experience yet with Deep Learning to help you implementing that. Have you checked the following repository:
https://github.com/Kaixhin/Rainbow

Zhengyao Jiang · Answer 9 · Fri Oct 20 2017 21:43:22 GMT+0800 (China Standard Time)

@akaniklaus

Have you ever tried your method with hourly data? in my experience OLMAR and RMR performing even better in terms of returns with it.

It seems worth to try. Did you take commission fee into consideration?

I can also point out that the set of selected tokens affects their performance dramatically

I suppose you mean the selection of assets? This process is done by automatically select top-volume assets at the end of training data. Select by hand might introduce survivorship bias.

And guys, this issues has been out of topic. If you want further communication, it's a good choice to e-mail me or use hangouts.
my Gmail: jzyjiangzhengyao@gmail.com

akaniklaus · Answer 10 · Sat Oct 21 2017 15:03:31 GMT+0800 (China Standard Time)

@ZhengyaoJiang Yes, I did take commission fee into consideration. One disturbing things about OLPS was that portfolios have very high entropy (meaning that it bets on only a few tokens (often one or two token) in each round and shifts completely.

I found this a bit disturbing and tried to reduce this behavior by using the lowest epsilon value or having a Kalman filter, but they didn't result better in backtests. I am curious if yours also make a similar behavior while distributing the wealth into tokens.

I am not talking about picking best performing ones in training data, that would certainly cause a bias. However, I would select assets based on Expert-knowledge in a real-world use, as that's also what people would normally do when buying & holding.

Ok, thank you very much. I will make further communication via your Gmail. Have a nice weekend.