karthiksmanian / TSC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataGenie-hackathon - Time Series Classifier

Dependancies to be installed:

statsmodels

pandas

scikit-learn

fastapi

matplotlib

Tensorflow

xgboost

To run : uvicorn api_endpoints:app --reload

Check localhost:8000/docs and use uploadfile to upload dataset.

Specify exact fields in the dataset, the default values are used here.

image

Then the plot of the dataset is displayed.

image

Then Model is built based on the features of the dataset and the predictions are made for the test set.

image

The model used and the MAPE of the model is returned as a response.

image

Time Series data decomposition visualization.

In bash, run command "streamlit run dashboard.py"

image

image

Implementing the Model classifier

The initial thought process was to create a variant of nbc or kernel SVM classifier which classifies the best model to take based on the time series extracted features like trend seasonality etc. But I was unable to carry out the implementation and chose to pick the model based on the extracted features like seasonality,trend,hr,day comp instead ,also one of the reason for this was more than 1 model comes under multiple use case sometimes. The models chosen are :

1. XGBoost

It is chosen when the time component is daily/hourly as there are multiple features that are extracted from the point_timestamp the features and train set is feed to XGBoost model.

2. AR,MA,ARIMA

It is chosen when the dataset does not have the daily and hourly component. The arima(p,d,q) paramters are tuned based on the outliers formed above the confidence intervals of the Autocorrelation and Partial Autocorrelation found from the dataset. So that the model could either be AR or MA or ARIMA model based on the values of p,d,q tuned.

3. LSTM

It is chosen when the dataset has unexplainable trend component. There are some datasets which I have came upon where it passes the stationarity tests and has very minimal trend component but is unable to fit on the previous models.

Prophet is one of the models which can capture the seasonality of the time series and build the model upon it. But I was unable to incorporate prophet as I had trouble installing the module because of my depricated compiler version and linkage of path error of the conda site packages.

Inference

This classifier may not produce the best results on any dataset given as input as there are some unexplainable statistical features that my model might have not covered. These are the assumptions made while developing the classifier. Overall a single model is selected for the build and the MAPE value is generated and returned as a response.

About


Languages

Language:Python 100.0%