Royston-Soh / dow-facebook-prophet

Time series forecasting for Dow Jones Industrial Average using Facebook Prophet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Time series forecasting for Dow Jones Industrial Average (using Facebook Prophet)

In this post, we try to examine the usefulness of Facebook Prophet in forcasting stock market indices.

library(xts)
library(quantmod)
library(prophet)
library(ggplot2)
library(forecast)
library(lubridate) 

Read historical data for Dow Jones Industrial Average (adjusted closing prices)

setwd("F:/My Files/R Studio/DOW S&P500")
dow_data=read.csv('dow_data_v3.csv',header = T,sep = ',')
head(dow_data)
tail(dow_data) 

Facebook Prophet

Standardize the dates and prepare data for conversion to time series xts format, using from year 2008 onward

dow_data$Date=dmy(dow_data$X) #Standardize the dates using lubridate package

df=data.frame(dow_data$DJI) #Prepare time series format
rownames(df)=dow_data$Date
colnames(df)=c('DJI')

df_xts=as.xts(df) #Convert to xts format
df_xts=df_xts['2008/'] #extract data frm 2008 onward

Facebook Prophet

Plot the chart series

We observe an upward multiplicative (exponential) trend. As prophet library makes time series forecast based on additive regression trend model, we need to do a log transformation to linearize the data using log transformation.

chartSeries(df_xts,
            theme=chartTheme('white')) 

Facebook Prophet

Prepare dataframe with variables ds and y assigned for prophet model, perform log transformation

df=data.frame(df_xts)
df$ds=rownames(df)
ds=df$ds
y=df$DJI
df=data.frame(ds,y)
df$ds=as.Date(df$ds)

y=log(df$y)
df=data.frame(ds,y)
df$ds=as.Date(df$ds)

qplot(ds,y,data=df,
      main='Dow Jones Industrial Average in log scale')

Facebook Prophet

Split the data to training and test set

Let's predict and validate for the last 252 trading days

training_length=length(df$y)-252
test_start=training_length+1

df_training=df[1:training_length,]
df_test=df[test_start:length(df$y),] 

Specify the forecasting model, and make furture prediction using training data

m=prophet()
m=fit.prophet(m,df_training) 

future=make_future_dataframe(m,periods=252)
forecast=predict(m,future)

Forecast Components

We observe a long term upward trend, as well as seasonality, where the index peaks in the months of May and August and bottoms out in the months of March and November. There is no weekly pattern being observed in the above plots.

prophet_plot_components(m,forecast) 

Facebook Prophet

Model Performance

We observe an overall linear upward trend, with R-squared at 0.9825. Up to 98.25% of the variation in predicted values can be explained by the variation in actual values in the prediction model. However, this measures the goodness-of-fit and does not provide information on the accuracy of the model.

pred=forecast$yhat
actuals=df$y
plot(actuals,pred) 
abline(lm(pred~actuals),col='red',lwd=2)
summary(lm(pred~actuals))

Facebook Prophet Facebook Prophet

Plot for forecast and residuals

#Create dataframe for residuals
residuals_m=df$y-forecast$yhat
df_residuals=data.frame(df$ds,residuals_m)
colnames(df_residuals)=c('ds','residuals') 

#Forecast plot
plot(m,forecast)

#Residual plot
qplot(ds,residuals,data=df_residuals,
      main='Plot of residuals in log scale')+
  geom_vline(xintercept = as.numeric(ymd(df_test[1,1])), 
             color = "blue") 

Facebook Prophet Facebook Prophet

Model accuracy

We notice that the accuracy metrics of the model deteriorates when we compare the validation set against the training set, suggestive of overfitting. This could be due to the highly volatile nature of the stock index, where the variance of the residuals are not constant, rendering the model which uses the least squares method less accurate. Facebook Prophet is not that useful for predicting stock index in times of high volatility, also, the accuracy for forecasts deteriorates as we project further into the future (medium to long-term).

#Performance metrics for training set
predicted_train=forecast$yhat[1:(length(forecast$yhat)-252)]
actuals_train=df_training$y
round(accuracy(predicted_train,actuals_train),2)

#Performance metrics for test/validation set
predicted_v=forecast$yhat[(length(forecast$yhat)-252+1):length(forecast$yhat)]
actuals_v=df_test$y
round(accuracy(predicted_v,actuals_v),2) 

Facebook Prophet Facebook Prophet

Accuracy of model in actual scale (test data)

round(accuracy(exp(predicted_v),exp(actuals_v),2))

Let's rescale back our data and do a plot

df$dowjones=exp(df$y)
df$predicted=exp(forecast$yhat)

df_residuals$Residuals=df$dowjones-df$predicted

ggplot(df, aes(ds)) + 
  geom_line(aes(y = dowjones, color='dow')) + 
  geom_line(aes(y = predicted, color='predicted'))+
  geom_vline(xintercept = as.numeric(ymd(df_test[1,1])), 
             color = "blue")

qplot(ds,Residuals,data=df_residuals,
      main='Plot of residuals')+
  geom_vline(xintercept = as.numeric(ymd(df_test[1,1])), 
             color = "blue")

Facebook Prophet Facebook Prophet

About

Time series forecasting for Dow Jones Industrial Average using Facebook Prophet


Languages

Language:R 100.0%