Deanzou / AlphaTrader

An implementation of a stock market trading bot, which uses Deep Q Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Proposal

Scientific Papers

For my project in Applied Deep Learning I chose to focus on Deep Reinforcement Learning (DRL) in the financial market or rather on the stock market. The idea behind this proposal is to create a Deep Q Network (DQN) which can trade financial products from tech-companies, such as Google or Apple. This topic seems to attract a great deal of attention, since there are dozens of scientific papers on sites like e.g. arXiv.org covering this problem. Therefore, there are many directions in which this project might develop, but for the beginning I will use a simple DQN in combination with the following four papers:

These papers were mainly used to get an idea on how to preprocess financial data, design training- and testing datasets and define a benchmark to evaluate the performance of the implemented agent.


Other approaches, which were not used for now, but could be of future interest are the usage of Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN), with a focus on models with a Long Short Term Memory (LSTM).

CNN's

Predict Forex Trend via Convolutional Neural Networks, Conditional time series forecasting with convolutional neural networks, Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market

RNN's

Stock Prices Prediction using Deep Learning Models, Global Stock Market Prediction Based on Stock Chart Images Using Deep Q-Network, Financial series prediction using Attention LSTM


Another idea for the future is the inclusion of sentiment analysis in the model. Papers available on this topic are:


Another approach provides this paper, which tries to simulate the "whole stock market" in a multi agent system (MAS), where each agent learns individually and trades on its own. The collective behaviour of agents is then used to predict the market. This method is out of the projects scope at the moment due to missing processing power and time, but might be of interest in future work.

Topic

As already mentioned, this project will have a focus on Reinforcement Learning (RL), especially in the context of stock trading and the prediction of this market using a DQN.

Project Type

Concerning the project type, there are many options applicable. Types like Bring your own data, Bring your own method and Beat the stars can all be applied, since the project can evolve in many directions in the future. For example Bring your own data may be needed if future work focuses on the inclusion of sentiment analysis in the prediction. However if the project goes beyond the scope of this lecture, focus will be lied solely on DRL with a DQN agent, which will at least result in Bring your own method.

Summary

  • Description and Approach

    The goal of the project is to predict different stocks from different companies, such as Google or Apple.

    I will begin with standard DRL approaches listed on SpinningUp and their Baseline Implementation to get an overview and a general practical understanding of this field as well as an insight in Keras or PyTorch. Then I will try to use different approaches from the earlier mentioned papers to predict the market with DRL.

    After a first working model has been implemented, it will be used as a baseline for further hyper parameter tuning and model variations.

    For general comparison I will use a third party extension of the OpenAI Gym Toolkit called AnyTrading, which is a testing and training environment to compare trading approaches.

  • Dataset

    The datasets for training and testing will be acquired from Yahoo! Finance, focusing on tech companies like Google or Apple. However, any other stock data would work as well. For the pre-processing of this data, I will start evaluating the agent on non pre-processed data, followed by different scaling methods, such as Sigmoid, MinMax or Standard.

  • Work-Breakdown Structure

Individual Task Time estimate Time used
research topic and first draft 5h 13h
setting up the environment and acquiring the datasets 3h 7h
designing and building an appropriate network 15h 22h
fine-tuning and varying that network 15h 15h
building an application to present the results 6h 13h
writing the final report 5h 5h
preparing the presentation of the project 3h 2h

Implementation

Error Metric

  • Error Metric
    Every agent's structure, hyper parameters as well as the choice of scaling techniques, will be trained for 650 epochs on the trainings dataset (AAPL_train.csv). Therefore, different approaches can be evaluated and compared using the average profit as well as the average reward of the last 50 epochs (600-650).

    Reward is defined by the capability to correctly predict the direction of the stock price of the following day. For example, if the price falls and the agent bet on falling prices (SHORT), it will receive a positive reward or if the price falls and the agent bet on rising prices (LONG), it will receive a negative reward, consisting of the price difference.

    Profit is defined by the price difference between two time steps, where the agent changed its opinion about the trend, switching from LONG to SHORT or the other way around. This definition implies a trade, where the agent e.g. sells all its LONG-positions and buys as much SHORT-positions as possible, to not lose any money.

    This metric is used to verify that the agent is actually making progress. Since this verification is only used on the trainings dataset, it does not give an estimation on the real-life performance on unseen data. Thus, a test suite was implemented to compare models on unseen data and compare them by earned profit and reward on a given test set (AAPL_test.csv).

  • Error Metric Target
    First benchmarks of the implemented agent were quite misleading, resulting in an average profit of 0.477 and an average reward of 3.568. Thus, I set my target to reach at least an average profit of 1, which would mean that the agent is at least profitable on the trainings set. After many iterations of adjusting hyper parameters and changing the model and still resulting in really bad and random performance, I took a closer look on the implementation of the used environment, called AnyTrading. After a short observation, I felt completely unsatisfied with the implementation and therefore defined my own calculations of reward and profit. This change finally gave me the impression that my agent is making progress and actually learning. Thus, earlier saved models and plots are not comparable to newer ones. After the change the target goal of 1 was quite simple to archive and is therefore not really representative.

  • Error Metric Achievement
    The following table displays the performance results of the last 7 agent variations, which all performed better than the target of 1.

Average Profit Average Reward
19.794 984.336
2.763 507.834
6.313 207.225
22.684 992.019
8.445 730.180
15.148 474.520
5.843 349.651

The following plot shows the average profit by episode
Plot of the average profit by episode

and the average reward of the best model.
Plot of the average profit by episode

Since the evaluation of the agent on the trainings set is not that interesting and is only used to verify that the agent is actually learning something, I will provide some plots, which show the performance of the model on unseen data.

Green dots are time steps, where the agent decided to go LONG
Red dots are time steps, where the agent decided to go SHORT

Plot of a model trained on AAPL, tested on GOOG
Plot of a model trained on AAPL, tested on GOOG

Plot of a model trained on GOOG, tested on GOOG
Plot of a model trained on GOOG, tested on GOOG

Plot of a model trained on GOOG, tested on AAPL
Plot of a model trained on GOOG, tested on AAPL

Changelog

Original Hyper Parameter

  • Training per episode: 1
  • Size of replay memory: 20.000
  • Size of minibatch: 32
  • Discount rate gamma: 0.95
  • Exploration rate epsilon: 1.0
  • Exploration rate epsilon min: 0.001
  • Exploration rate decay: 0.995
  • Learning rate: 0.001

Original Model

model = Sequential()
        model.add(Dense(64, input_dim=self.state_size, activation='relu'))
        model.add(Dense(32, input_dim=self.state_size, activation='relu'))
        model.add(Dense(8, activation='relu'))
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse',
                      optimizer=Adam(lr=self.learning_rate))

Changes

  1. Varying optimizer

  2. Changing size of minibatch to 64

  3. Varying scaling methods from 0 to 1

  4. Change reward and profit calculation

  5. Early stop if profit < 0.5

  6. Early stop if profit < 0.8

  7. Varying epsilon and size of minibatch

  8. Training model 4 times per episode

  9. Adapting hyper parameter and model structure
    Adapted Hyper Parameters

    • Training per episode: 4
    • Size of replay memory: 20.000
    • Size of minibatch: 32
    • Discount rate gamma: 0.95
    • Exploration rate epsilon: 1.0
    • Exploration rate epsilon min: 0.01
    • Exploration rate decay: 0.995
    • Learning rate: 0.0005

      Adapted Model
    model = Sequential()
            model.add(Dense(64, input_dim=self.state_size, activation='relu'))
            model.add(Dense(32, input_dim=self.state_size, activation='relu'))
            model.add(Dense(8, activation='relu'))
            model.add(Dense(self.action_size, activation='softmax'))
            model.compile(loss='mse',
                          optimizer=Adam(learning_rate=self.learning_rate))
  10. Varying amount of training of model per episode

  11. Varying dropout

  12. Changing size of minibatch to size of replay memory, training with 10% chance

  13. Varying scaling methods from 0.1 to 1

  14. Varying layers and activation functions of model

Setup Guide

To try own datasets download a training and test split from Yahoo! Finance, preferably overlapping 30 days, into data/

To install the needed dependencies run pip install requirements.txt

Afterwards you can train your own model by specifying the mode and the trainings data

python main.py -m train -d AAPL_train.csv

Or you can use existing models for evaluation by specifying the mode, the testing data and the model

python main.py -m test -d AAPL_test.csv -n model_18_17_06

Especially model_18_17_07 and model_18_21_52 perform quite well.

If you only want to run the backend for web-application execute server.py

python server.py

Afterwards the backend should be accessible on http://localhost:5000

About

An implementation of a stock market trading bot, which uses Deep Q Learning


Languages

Language:Python 100.0%