Policy forward prop
allentran opened this issue · comments
Allen Tran commented
states:
- scaled prices (0, ..., -k) (x-m)/sigma (k=30?),
- information, volume + time until next trading day
- holdings
- cash
actions
- sell fraction
- buy fraction
pipe them through some dense layers + a GRU or LSTM for the scaled price sequence, take care of action constraints, spit out actions, 2 x number of assets