bankruptcy-prediction historical-data jupyter-notebook machine-learning time-series-analysis

Taiwanese Bankruptcy Prediction

This was a project based on a Kaggle Dataset for our class on IF699 at CIn UFPE

Group: Andre Filho; Gabriel Lyra; Matheus Belfort

Notes for anyone trying to run our code:

For a better visualization and optimization of the code, we do recommend using Google Colab to load and run it, after loading the repo there, the code should be workable as a normal jupyter notebook

Rules for the Project

Following the steps needed to achieve the completion of our final project according to the description of this assignment, we would need to follow the following steps:

Try different models in order to check the best fitting of them for the prediction
- XGBoost
- Linear Regression
- SVM
- KNN
- Random Forest
- Decision Tree
- Neural Networks
Try different ensembles of different approaches in order to achieve better results:
- Voting
- Stacking
- Bagging
Hyperparameter Optimaztion
Model Evaluation
- F1
- ROC

Context

For financial institutions, the ability of predicting whether a business would go bankrupt or not is a matter of huge importance, especially due to possible lendings and financings that directly correlate the risk of investing with the interest rate

However, one problem that lots of institutions go through is that the demonstrative dataset of a company usually is way too big for a proper analysis to be run.

Our current dataset has over 6800 entries, each one of them contains a maximum of 95 features and a label column, meaning that processing this dataset is a very computational-demanding task

Previous results:

Previous approaches generally focused in more exploratory data analysis with a variety of models for oversampling, re-sampling and, some feature selection approaches as well

Our results:

As long as a conservative client must be interested, this is how far our model goes on...

Results
ROC Curve

Our results show a good precision and recall of data, when compared our given labels to the ones already present in the dataset. This means our results is satisfactory for conservative financial institutions that must be willing to give out loans to enterprises

What else could we do in a next time?

Removal of correlated columns in order to avoid weighting the results in any direction
Make plots and an exploratory data analysis of the dataset itself, through analysis of distributions and boxplots, for instance
Weighting or expanding the dataset in order to get a balanced number of "bankrupted" entries (only ~3% is under this category)
Make a smarter use of feature engineering in order to achieve better processing speeds and results
After running PCA we should run KNN in order to better classsify each member, this would probably give us a better accuracy

About

Bankruptcy Prediction Analysis of Enterprises based in Taiwan with Machine Learning Algorithms

bankruptcy-prediction historical-data jupyter-notebook machine-learning time-series-analysis

Languages

Language:Jupyter Notebook 100.0%