Mini machine learning project playing with feature and model selection
Data Source: UCI
Variety factors can lead to failure in business performance. Detecting the pattern from the financial report can help detecting potential bankruptcy companies and provide timely consultant. My project is an exploration of the Polish companies Bankruptcy data, applies some models to find out which one can provide the best prediction of bankruptcy.
- First, download the data from this link of UCI
- After download and unzip the file, choose the 3year.arff file as the data file only. This data contains financial rates from 3rd year of the forecasting period and corresponding class label that indicates bankruptcy status after 3 years. The data contains 10503 instances (financial statements), 495 represents bankrupted companies, 10008 firms that did not bankrupt in the forecasting period.
- If you want to choose other year, it's okay, but the datafile name in the jupyter notebook should be changed accordingly.
- Run the jupyter notebooks in the
src
(source) folder:- read_file.ipynb - to load the file, clean up and do some data wrangling to make a clean dataset
- EDA.ipynb - takes in the data produced by read_file.ipynb and performs exploratory data analysis and creates a plot and two tables
- Model_ft_selection.ipynb - takes in the data produced by read_file.ipynb and performs feature and model selection (one table and two visualizations to present these results)
- OR you can open the report in
result
folder. - OR you can see the whole process in the bankruptcy_feature_selection.ipynb file in the
src
folder.
-
Jupyter Notebook
Libraries:
-
scikitlearn
-
pandas
-
numpy
The report could be found in here