Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy
This repository provides codes for ICAIF 2020 paper
This ensemble strategy is reimplemented in a Jupiter Notebook at FinRL.
Stock trading strategies play a critical role in investment. However, it is challenging to design a profitable strategy in a complex and dynamic stock market. In this paper, we propose a deep ensemble reinforcement learning scheme that automatically learns a stock trading strategy by maximizing investment return. We train a deep reinforcement learning agent and obtain an ensemble trading strategy using the three actor-critic based algorithms: Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG). The ensemble strategy inherits and integrates the best features of the three algorithms, thereby robustly adjusting to different market conditions. In order to avoid the large memory consumption in training networks with continuous action space, we employ a load-on-demand approach for processing very large data. We test our algorithms on the 30 Dow Jones stocks which have adequate liquidity. The performance of the trading agent with different reinforcement learning algorithms is evaluated and compared with both the Dow Jones Industrial Average index and the traditional min-variance portfolio allocation strategy. The proposed deep ensemble scheme is shown to outperform the three individual algorithms and the two baselines in terms of the risk-adjusted return measured by the Sharpe ratio.
Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. 2020. Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy. In ICAIF ’20: ACM International Conference on AI in Finance, Oct. 15–16, 2020, Manhattan, NY. ACM, New York, NY, USA.
git clone https://github.com/AI4Finance-LLC/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020.git
For OpenAI Baselines, you'll need system packages CMake, OpenMPI and zlib. Those can be installed as follows
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx
Mac OS X
Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the following:
brew install cmake openmpi
To install stable-baselines on Windows, please look at the documentation.
Create and Activate Virtual Environment (Optional but highly recommended)
cd into this repository
Under folder /Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020, create a virtual environment
pip install virtualenv
Virtualenvs are essentially folders that have copies of python executable and all python packages.
Virtualenvs can also avoid packages conflicts.
Create a virtualenv venv under folder /Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020
virtualenv -p python3 venv
To activate a virtualenv:
The script has been tested running under Python >= 3.6.0, with the folowing packages installed:
pip install -r requirements.txt
If you have questions regarding TensorFlow, note that tensorflow 2.0 is not compatible now, you may use
pip install tensorflow==1.15.4
If you have questions regarding Stable-baselines package, please refer to Stable-baselines installation guide. Install the Stable Baselines package using pip:
pip install stable-baselines[mpi]
This includes an optional dependency on MPI, enabling algorithms DDPG, GAIL, PPO1 and TRPO. If you do not need these algorithms, you can install without MPI:
pip install stable-baselines
Please read the documentation for more details and alternatives (from source, using docker).
Run DRL Ensemble Strategy
Use Quantopian's pyfolio package to do the backtesting.
Version History [click to expand]
- 1.0.1 Changes: added ensemble strategy
- 0.0.1 Simple version
The stock data we use is pulled from Compustat database via Wharton Research Data Services.
Our purpose is to create a highly robust trading strategy. So we use an ensemble method to automatically select the best performing agent among PPO, A2C, and DDPG to trade based on the Sharpe ratio. The ensemble process is described as follows:
- Step 1. We use a growing window of 𝑛 months to retrain our three agents concurrently. In this paper we retrain our three agents at every 3 months.
- Step 2. We validate all 3 agents by using a 12-month validation- rolling window followed by the growing window we used for train- ing to pick the best performing agent which has the highest Sharpe ratio. We also adjust risk-aversion by using turbulence index in our validation stage.
- Step 3. After validation, we only use the best model which has the highest Sharpe ratio to predict and trade for the next quarter.