Project Proposal

Scientific Papers

For my project in Applied Deep Learning I chose to focus on Deep Reinforcement Learning (DRL) in the financial market or rather on the stock market. The idea behind this proposal is to create a Deep Q Network (DQN) which can trade financial products from tech-companies, such as Google or Apple. This topic seems to attract a great deal of attention, since there are dozens of scientific papers on sites like e.g. arXiv.org covering this problem. Therefore, there are many directions in which this project might develop, but for the beginning I will use a simple DQN in combination with the following four papers:

These papers were mainly used to get an idea on how to preprocess financial data, design training- and testing datasets and define a benchmark to evaluate the performance of the implemented agent.

Other approaches, which were not used for now, but could be of future interest are the usage of Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN), with a focus on models with a Long Short Term Memory (LSTM).

CNN's

Predict Forex Trend via Convolutional Neural Networks, Conditional time series forecasting with convolutional neural networks, Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market

RNN's

Stock Prices Prediction using Deep Learning Models, Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network, Financial series prediction using Attention LSTM

Another idea for the future is the inclusion of sentiment analysis in the model. Papers available on this topic are:

Forex trading and Twitter: Spam, bots, and reputation manipulation
=> Research on the influence of Tweets on the market and whether to buy, hold or sell.
Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction
=> Mechanism to process recent news related to the stock market.

Another approach provides this paper, which tries to simulate the "whole stock market" in a multi agent system (MAS), where each agent learns individually and trades on its own. The collective behaviour of agents is then used to predict the market. This method is out of the projects scope at the moment due to missing processing power and time, but might be of interest in future work.

Topic

As already mentioned, this project will have a focus on Reinforcement Learning (RL), especially in the context of stock trading and the prediction of this market using a DQN.

Project Type

Concerning the project type, there are many options applicable. Types like Bring your own data, Bring your own method and Beat the stars can all be applied, since the project can evolve in many directions in the future. For example Bring your own data may be needed if future work focuses on the inclusion of sentiment analysis in the prediction. However if the project goes beyond the scope of this lecture, focus will be lied solely on DRL with a DQN agent, which will at least result in Bring your own method.

Summary

Description and Approach

The goal of the project is to predict different stocks from different companies, such as Google or Apple.

I will begin with standard DRL approaches listed on SpinningUp and their Baseline Implementation to get an overview and a general practical understanding of this field as well as an insight in Keras or PyTorch. Then I will try to use different approaches from the earlier mentioned papers to predict the market with DRL.

After a first working model has been implemented, it will be used as a baseline for further hyper parameter tuning and model variations.

For general comparison I will use a third party extension of the OpenAI Gym Toolkit called AnyTrading, which is a testing and training environment to compare trading approaches.
Dataset

The datasets for training and testing will be acquired from Yahoo! Finance, focusing on tech companies like Google or Apple. However, any other stock data would work as well. For the pre-processing of this data, I will start evaluating the agent on non pre-processed data, followed by different scaling methods, such as Sigmoid, MinMax or Standard.
Work-Breakdown Structure

Individual Task	Time estimate	Time used
research topic and first draft	5h	13h
setting up the environment and acquiring the datasets	3h	7h
designing and building an appropriate network	15h	22h
fine-tuning and varying that network	15h	15h
building an application to present the results	6h	13h
writing the final report	5h	5h
preparing the presentation of the project	3h	2h

Implementation

Error Metric

Error Metric
Every agent's structure, hyper parameters as well as the choice of scaling techniques, will be trained for 650 epochs on the trainings dataset (AAPL_train.csv). Therefore, different approaches can be evaluated and compared using the average profit as well as the average reward of the last 50 epochs (600-650).

Reward is defined by the capability to correctly predict the direction of the stock price of the following day. For example, if the price falls and the agent bet on falling prices (SHORT), it will receive a positive reward or if the price falls and the agent bet on rising prices (LONG), it will receive a negative reward, consisting of the price difference.

Profit is defined by the price difference between two time steps, where the agent changed its opinion about the trend, switching from LONG to SHORT or the other way around. This definition implies a trade, where the agent e.g. sells all its LONG-positions and buys as much SHORT-positions as possible, to not lose any money.

This metric is used to verify that the agent is actually making progress. Since this verification is only used on the trainings dataset, it does not give an estimation on the real-life performance on unseen data. Thus, a test suite was implemented to compare models on unseen data and compare them by earned profit and reward on a given test set (AAPL_test.csv).
Error Metric Target
First benchmarks of the implemented agent were quite misleading, resulting in an average profit of 0.477 and an average reward of 3.568. Thus, I set my target to reach at least an average profit of 1, which would mean that the agent is at least profitable on the trainings set. After many iterations of adjusting hyper parameters and changing the model and still resulting in really bad and random performance, I took a closer look on the implementation of the used environment, called AnyTrading. After a short observation, I felt completely unsatisfied with the implementation and therefore defined my own calculations of reward and profit. This change finally gave me the impression that my agent is making progress and actually learning. Thus, earlier saved models and plots are not comparable to newer ones. After the change the target goal of 1 was quite simple to archive and is therefore not really representative.
Error Metric Achievement
The following table displays the performance results of the last 7 agent variations, which all performed better than the target of 1.

Average Profit	Average Reward
19.794	984.336
2.763	507.834
6.313	207.225
22.684	992.019
8.445	730.180
15.148	474.520
5.843	349.651

The following plot shows the average profit by episode

and the average reward of the best model.

Since the evaluation of the agent on the trainings set is not that interesting and is only used to verify that the agent is actually learning something, I will provide some plots, which show the performance of the model on unseen data.

Green dots are time steps, where the agent decided to go LONG
Red dots are time steps, where the agent decided to go SHORT

Plot of a model trained on AAPL, tested on GOOG

Plot of a model trained on GOOG, tested on GOOG

Plot of a model trained on GOOG, tested on AAPL

Changelog

Original Hyper Parameter

Training per episode: 1
Size of replay memory: 20.000
Size of minibatch: 32
Discount rate gamma: 0.95
Exploration rate epsilon: 1.0
Exploration rate epsilon min: 0.001
Exploration rate decay: 0.995
Learning rate: 0.001

Original Model

model = Sequential()
        model.add(Dense(64, input_dim=self.state_size, activation='relu'))
        model.add(Dense(32, input_dim=self.state_size, activation='relu'))
        model.add(Dense(8, activation='relu'))
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse',
                      optimizer=Adam(lr=self.learning_rate))

Changes

Varying optimizer
Changing size of minibatch to 64
Varying scaling methods from 0 to 1
Change reward and profit calculation
Early stop if profit < 0.5
Early stop if profit < 0.8
Varying epsilon and size of minibatch
Training model 4 times per episode

Adapting hyper parameter and model structure
Adapted Hyper Parameters

Training per episode: 4
Size of replay memory: 20.000
Size of minibatch: 32
Discount rate gamma: 0.95
Exploration rate epsilon: 1.0
Exploration rate epsilon min: 0.01
Exploration rate decay: 0.995
Learning rate: 0.0005

Adapted Model

model = Sequential()
        model.add(Dense(64, input_dim=self.state_size, activation='relu'))
        model.add(Dense(32, input_dim=self.state_size, activation='relu'))
        model.add(Dense(8, activation='relu'))
        model.add(Dense(self.action_size, activation='softmax'))
        model.compile(loss='mse',
                      optimizer=Adam(learning_rate=self.learning_rate))

Varying amount of training of model per episode
Varying dropout
Changing size of minibatch to size of replay memory, training with 10% chance
Varying scaling methods from 0.1 to 1
Varying layers and activation functions of model

Setup Guide

To try own datasets download a training and test split from Yahoo! Finance, preferably overlapping 30 days, into data/

To install the needed dependencies run pip install requirements.txt

Afterwards you can train your own model by specifying the mode and the trainings data

python main.py -m train -d AAPL_train.csv

Or you can use existing models for evaluation by specifying the mode, the testing data and the model

python main.py -m test -d AAPL_test.csv -n model_18_17_06

Especially model_18_17_07 and model_18_21_52 perform quite well.

If you only want to run the backend for web-application execute server.py

python server.py

Afterwards the backend should be accessible on http://localhost:5000

Deanzou / AlphaTrader