CMF_AHF

Work on a project "Researching of Algorithmic hedge fund" in the Center of Mathematical Finances.

Existing strategies need to be explored first.
- Choosing a article
- Writing proposal
- Collecting the data used in article(We are here)
Analyze historical data using statistical methods.
Try to make some hypotheses about price patterns, market inefficiencies, etc.
Defining of trading rules.
Historical data backtesting.
Real-time testing with broker.
Production and impovement of running strategies.

Proposal

Current situation

The task of predicting crude oil prices has been and will continue to be relevant. In this project, we will try to show that Lasso and ElasticNet cope well with this task and even outperform some algorithms in many ways, especially in success ratio.

Suggestions

Collect the data
Compare models with each other
Find out why Lasso and ElasticNet outperform their competitors
See what features Lasso and ElasticNet rely on in their predictions
Try several windows of data (last 50%, 40%,30%, etc) for evaluating optimal Lasso and ElasticNet coefficients.

Sources

What data do we need:

Article Pipeline

How we fit and evaluate the models?

Fit on a train dataset
Evaluate on a out-of-sample dataset

Why we want to use out-of-sample dataset? Because out-of-sample test avoids the in-sample over-fitting issue and is more relevant for assessing genuine return predictability in real time. To avoid the look-ahead bias, we should only use the information available up to 𝑡 to generate the out-of-sample forecast at 𝑡+1

What metrics we will use?

R²_OS = 1 - ^MSPE_M⁄_{MSPE_B}

Where MSPE_M = ¹⁄_q*Σ^q_i=1(r_m+i-ř_m+i)² denotes the mean squared prediction error (MSPE) of the forecasting model of interest

And MSPE_B = ¹⁄_q*Σ^q_i=1(r_m+i-ř_B,m+i)² denotes the MSPE of the benchmark model(Historical average)

r_m+i, ̂ř_m+i, and ̂ř_B,m+i are the actual oil return, the oil return predicted by the forecasting models, and the benchmark forecast, respectively, at month m+i, and m and q are the length of the in-sample estimation period and the out-of-sample evaluation period, respectively.

This is R² statistic but for out-of-sample dataset

Success ratio

Pipeline

We always fit our models on data from January 2 1997 to February 28 2020. out-of-sample dataset starting from March 3 2020 and to the end(March 31, 2021), in one point validation will begin earlier We always study on the entire dataset(macro features and technical features) unless otherwise written

Fitting and validation, nothing different from the default parameters
Fitting with feature selection, ElasticNet selects features for ElasticNet, Lasso selects features for Lasso, we look at non-zero coefficients (we first fit and look at coefficients, and then we fit using features with non-zero coefficients)
Fitting when two sets of features(macro and technical) are used separately
Fitting with economic constraints. An economic constraint that a rational investor will rule out a negative stock return forecast and therefore set the forecast to zero whenever it is negative
Fit and check metrics on out-of-sample high- and low-sentiment periods
Fit using five different window sizes (30%, 35%, 40%, 45%, and 50% of estimation sample) to estimate LASSO parameters
FIt ElasticNet and Lasso with a fixed number of selected predictors. Аnd we also use a different value of the alpha parameter(0.3, 0.5, 0.7) for ElasticNet
TBA
Out-of-sample dataset now starts from 2007(To be fixed)
In this subsection, we will further consider another prevailing indicator of crude oil prices, namely, the cost of the purchase of crude oil imports by an American oil refining company (hereinafter referred to as RAC). Out-of-sample dataset now starts again from March 3 2020
Forecasting nominal prices of crude oil and RAC
Fitting using FRED

Model results

Here

Bonus

If we have time: use the extracted features from the lasso and build neural network (for example, LSTM) and compare results.

The article we will use

Forecasting crude oil prices with a large set of predictors: Can LASSO select powerful predictors?

Additional materials

The Elements of Statistical Learning Data Mining, Inference, and Prediction

About

My work on a project "Researching of Algorithmic hedge fund" in the Center of Mathematical Finances.

Languages

Language:Jupyter Notebook 100.0%

GraC2H5OH / CMF_AHF