itsnavneetk / timeseries-cloud

Time series model for resource prediction on cloud computing facilty

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

timeseries-cloud

#Time series model for resource prediction on cloud computing facilty.

Dataset

The dataset that has been explored has been generated by simulations on a Docker environment.Seventeen such requirements are monitored herein. Data is logged every ten seconds. The dataset generated here has resource values for seven days. Seven days of data is logged because most of the environments used by the cloud computing facilities- the intended audience- has a very small window where in they have to prepare for managing resources.

Methdology

Prediction models explored were VAR (Vector Autoregression) and VARMA (Vector Autoregression Moving-Average). VAR and VARMA models were initially explored on the docker-data which consists of 17 different fields.

Techniques adopted on Docker data

The predictions are performed on the CPU usage field. Various data pre-processing techniques were applied to obtain more accurate prediction. Various differencing techniques were applied upon to gain knowledge about the order of time series. As the data set size was huge and few variables lacked enough variance, computation of the Cholesky decomposition resulted in the formation of a non invertible matrix. To deal with this feature selection techniques were performed. All the fields with variance less than 0.8 were eliminated. To further eliminate variables more univariate statistical tests were performed. The test chosen by us was chi-square test using which fourteen best performing features were selected. Hurst exponent was calculated to gain knowledge about the long range dependence on the series. The number of lags in the VAR models was varied and the results were compared to identify the best fit for the time series.

Conclusion:

Presence of Hurst coefficient of 0.61 indicated mild long-range dependency. Data appears to model itself after a random walk and we would like to argue that this indicates the presence of multiplicative relations in it.

About

Time series model for resource prediction on cloud computing facilty


Languages

Language:Python 100.0%