This is a machine learning project aimed at automating the approval or rejection of loan applications.
The Loan Approval Automation project aims to develop a machine learning model to automate the loan application process. This model helps in reducing labor costs while minimizing losses due to incorrect decisions.
Ensure you have the following installed:
- Python 3.x
-
Clone the Repository
git clone git@github.com:Ruth-Mwangi/Moneza-Loan-Approval-Automation.git
-
Create and activate virtual environment
python -m venv venv source venv/bin/activate
-
Install Dependancies
pip install -r requirements.txt
- raw: Contains raw data files that are used as the initial input for the project. These files are typically unprocessed and in their original form.
- processed: Contains data files that have been cleaned and processed. These files are the result of data preprocessing steps applied to the raw data.
- 01_data_preprocessing.ipynb: Jupyter Notebook for data cleaning and preprocessing.
- 02_eda.ipynb: Jupyter Notebook for exploratory data analysis (EDA).
- 03_model_training.ipynb: Jupyter Notebook for training and evaluating machine learning models.
- 04_threshold_optimization.ipynb: Jupyter Notebook for determining the optimal probability thresholds for model predictions to balance between different performance metrics.
- 05_business_impact_analysis.ipynb: Jupyter Notebook for analyzing the business impact of the model. This includes evaluating how the model’s predictions affect business metrics and decisions.
Contains common functions.
Contains the saved models
- figures: Directory for storing visualizations and figures used in reports. It includes plots, charts, and other graphical representations of the data.
- Purpose: Data cleaning and preprocessing.
- How to Run:
- Open Jupyter Notebook in your project directory:
jupyter notebook
- Navigate to
01_data_preprocessing.ipynb
in the Jupyter interface. - Execute each cell
- Open Jupyter Notebook in your project directory:
- Purpose: Exploratory Data Analysis (EDA).
- How to Run:
- Ensure Jupyter Notebook is running.
- Open
02_eda.ipynb
from the Jupyter interface. - Execute each cell
- Purpose: Training and evaluating machine learning models.
- How to Run:
- Open Jupyter Notebook.
- Open
03_model_training.ipynb
. - Execute each cell
- Purpose: Determining optimal probability thresholds for model predictions.
- How to Run:
- Open Jupyter Notebook.
- Open
04_threshold_optimization.ipynb
. - Execute each cell
- Purpose: Analyzing the business impact of model predictions and decision thresholds.
- How to Run:
- Open Jupyter Notebook.
- Open
05_business_impact_analysis.ipynb
. - Execute cells to evaluate the potential business outcomes, financial implications, and recommendations based on model results.
- Ensure Dependencies: Install required libraries listed in
requirements.txt
. - Data Files: Verify that data files are placed in the correct directories (
data/raw/
anddata/processed/
). - Environment: Run notebooks in a consistent environment with all dependencies and data accessible.
The models/
directory contains pickled files of trained machine learning models. Pickling is a way to serialize and deserialize Python objects, allowing you to save a model after training and load it later for inference or further analysis. Here's how to use these pickled models:
To use a pickled model, follow these steps:
-
Import Required Libraries: Ensure you have the necessary libraries imported. For model loading, you will need
pickle
orjoblib
(depending on how the model was saved).import pickle
-
Load the Model Use the appropriate method to load the model from the .pkl file. For example:
# Using pickle, Replace with actual file directory and name with open('models/model.pkl', 'rb') as file: model = pickle.load(file) # Example data for prediction.Replace this with actual data for prediction sample_data = [[0.5, 1.2, 3.4, 0.7]] # Predict using the loaded model predictions = model.predict(sample_data) print(predictions)
- Replace Placeholder Data: Ensure you replace any placeholder data and file names with your actual data and filenames when making predictions.
- File Names: Update the file names and paths to match your specific use case and file locations.
- Dependencies: Ensure that the environment where you load the model matches the one where it was trained, including library versions and configurations.