- Project Code
- Project Final Report -> 010_487_FinalReport.pdf in Project_Reports folder
- Project Video -> 010_487_VideoPresentation.mp4
- You can also see the video at this link if you don't want to download the video - https://drive.google.com/file/d/1X7knh3TF8HyQ44uu_hRR6DeIGgU4mkx2/view
- Representative sample of dataset -> Storage2 folder
- Varun Seshu - PES2201800074
- Hritik Shanbhag - PES2201800082
- Manas V Shetty - PES2201800670
- Shashwath S Kumar - PES2201800623
- Download the storage folder shared to you in google drive.
- The Storage2 folder is a representative sample of the complete data present.
- Create a DA Project folder
- Copy the storage folder as is to the newly created project folder
- cd to that folder and run
git init
, this initializes the git repository - run
git remote add origin "https://github.com/Varun487/DataAnalytics_PairTradingModels"
- run
git pull origin master
- Create a virtual environment called
venv
withpython3 -m venv venv
- Activate the virtual environment with
source venv/bin/activate
- In case the virtual environment isn't working, have a look at the documentation here https://docs.python.org/3/library/venv.html
- run
pip3 install -r requirements.txt
Contains 2 scripts
list_of_nse_companies.py
- To get names and tickers all stocks floating in the stock market as of 3rd September 2020stock_candle_data_and_volume.py
- To get historical candle stick data of the stock tickers collected from years 2000 - 2020
Clean data + Find 4 top stock pairs (2 pairs per sector) of the stock market to trade and perform correlation and co-integration testing.
- Handling Missing Data - Dropping the rows of the datasets which are missing data we can afford to do this due to a large amount of data and interpolation may lead to inaccurate data due to the volatility of some stocks.
- Deleting datasets which have < 3 years worth of data.
- Deleting the parts of the datasets with > 3 years of data (taking only data in range of years 2017-2019) - as a correlation needs to be within a fixed time period and we cannot let a strong correlation in the past affect the predictions made by the model when there is no significant correlation currently.
- Adding Company name and Exchange to the datasets for easy identification.
- Choose and find 10 stock pairs and the periods in which they are highly correlated and co-integrated.
- Create Bollinger Bands for chosen stocks to help aid visualization - Calculate the 20 Day Moving Average for all companies closing prices along with the 1, 2, 3 standard deviation prices above and below the share price.
- Show that visually the shares in a particular pair move in tandem.
- Perform pair trading and generate orders for all pairs according to z-score.
- Also add appropriate visualizations to the creation of orders.
Create ML models for all chosen stocks and predict values for the decided prediction week for each pair
- Decide the week of prediction for all pairs.
- For each stock, generate 4 models
- Linear Regression
- ARIMA
- LSTM
- For each model try to adjust it's parameters and training data for it to best fit the actual data of the test week for the stock pair.
- Get predictions for all stocks and all models per stocks.
Calculate returns given trading parameters, orders and real data for week and predicted data from models. Helps to evaluate different models and give the best model per stock.
- Decide capital, risk, rules for opening and closing a trade and other parameters for trading and trading style.
- Run the orders on all stocks real data for prediction week and models' predicted data.
- Calculate ratios and other parameters for all models' predicted values and evaluate them for each stock.
- Find best model per stock to maximize profit and establish whether pair trading is feasible with ML models.