GraC2H5OH / CMF_AHF

My work on a project "Researching of Algorithmic hedge fund" in the Center of Mathematical Finances.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CMF_AHF

Work on a project "Researching of Algorithmic hedge fund" in the Center of Mathematical Finances.

Contents:

What will be done

  • Existing strategies need to be explored first.
    • Choosing a article
    • Writing proposal
    • Collecting the data used in article(We are here)
  • Analyze historical data using statistical methods.
  • Try to make some hypotheses about price patterns, market inefficiencies, etc.
  • Defining of trading rules.
  • Historical data backtesting.
  • Real-time testing with broker.
  • Production and impovement of running strategies.

Proposal

Current situation

The task of predicting crude oil prices has been and will continue to be relevant. In this project, we will try to show that Lasso and ElasticNet cope well with this task and even outperform some algorithms in many ways, especially in success ratio.

Suggestions

  • Collect the data
  • Compare models with each other
  • Find out why Lasso and ElasticNet outperform their competitors
  • See what features Lasso and ElasticNet rely on in their predictions
  • Try several windows of data (last 50%, 40%,30%, etc) for evaluating optimal Lasso and ElasticNet coefficients.

Sources

What data do we need:

Article Pipeline

How we fit and evaluate the models?

  1. Fit on a train dataset
  2. Evaluate on a out-of-sample dataset

Why we want to use out-of-sample dataset? Because out-of-sample test avoids the in-sample over-fitting issue and is more relevant for assessing genuine return predictability in real time. To avoid the look-ahead bias, we should only use the information available up to 𝑡 to generate the out-of-sample forecast at 𝑡+1

What metrics we will use?

  1. R2OS = 1 - MSPEMMSPEB

Where MSPEM = 1qqi=1(rm+im+i)2 denotes the mean squared prediction error (MSPE) of the forecasting model of interest

And MSPEB = 1qqi=1(rm+iB,m+i)2 denotes the MSPE of the benchmark model(Historical average)

rm+i, ̂řm+i, and ̂řB,m+i are the actual oil return, the oil return predicted by the forecasting models, and the benchmark forecast, respectively, at month m+i, and m and q are the length of the in-sample estimation period and the out-of-sample evaluation period, respectively.

This is R2 statistic but for out-of-sample dataset

  1. Success ratio

Pipeline

We always fit our models on data from January 2 1997 to February 28 2020. out-of-sample dataset starting from March 3 2020 and to the end(March 31, 2021), in one point validation will begin earlier We always study on the entire dataset(macro features and technical features) unless otherwise written

  1. Fitting and validation, nothing different from the default parameters
  2. Fitting with feature selection, ElasticNet selects features for ElasticNet, Lasso selects features for Lasso, we look at non-zero coefficients (we first fit and look at coefficients, and then we fit using features with non-zero coefficients)
  3. Fitting when two sets of features(macro and technical) are used separately
  4. Fitting with economic constraints. An economic constraint that a rational investor will rule out a negative stock return forecast and therefore set the forecast to zero whenever it is negative
  5. Fit and check metrics on out-of-sample high- and low-sentiment periods
  6. Fit using five different window sizes (30%, 35%, 40%, 45%, and 50% of estimation sample) to estimate LASSO parameters
  7. FIt ElasticNet and Lasso with a fixed number of selected predictors. Аnd we also use a different value of the alpha parameter(0.3, 0.5, 0.7) for ElasticNet
  8. TBA
  9. Out-of-sample dataset now starts from 2007(To be fixed)
  10. In this subsection, we will further consider another prevailing indicator of crude oil prices, namely, the cost of the purchase of crude oil imports by an American oil refining company (hereinafter referred to as RAC). Out-of-sample dataset now starts again from March 3 2020
  11. Forecasting nominal prices of crude oil and RAC
  12. Fitting using FRED

Model results

Here

Bonus

If we have time: use the extracted features from the lasso and build neural network (for example, LSTM) and compare results.

The article we will use

Forecasting crude oil prices with a large set of predictors: Can LASSO select powerful predictors?

Additional materials

The Elements of Statistical Learning Data Mining, Inference, and Prediction

About

My work on a project "Researching of Algorithmic hedge fund" in the Center of Mathematical Finances.


Languages

Language:Jupyter Notebook 100.0%