Loan Eligibility Prediction

A personal classification project to predict loan eligibility using Python machine learning based on a dataset containing information on customers' past transactions. I will attempt to follow instructions to deploy the resulting model as an active app using Streamlit.

Project Inspiration:

The idea for this project came from a Blog post by Jonathan Okah called "How I Built and Deployed a Python Loan Eligibility Prediction App on Streamlit" posted on Finxster.

I chose it for a couple reasons:

It looked like it would be a fairly easy to replicate project for practice straight out of the gate after graduating my Codeup Data Science course.
I am getting ready to begin my house buying search and thought it my be helpful to my endevours.
I was very interested in the aspect of deploying the model as a functioning app using Streamlit.

I'm hoping to add additional criteria features to the program such as Debt to Income Ratio, Percent Credit Card Debt, etc... to be more accurate and realistic in line with the requirements of a mortgage lender. These are the unseen factors of the data that are not recorded but had to be met in order to even be eligible for a loan in the first place.

Optionally as an extension to this project, I have found an SQL dataset in the Codeup database named home_credit that has over 300k historical loan records and an additional 48.7k applications.

Project Overview:

This project creates a user interface form that takes in responses for specific features required to make a prediction on loan eligibility.

-------------------------------------------------- STILL WORKING --------------------------------------------------

Project Goals:

Create a model that effectively predicts Michelin food star award ratings based on content from the official Michelin review
Provide a well-documented jupyter notebook that contains our analysis
Produce a Final GitHub repository
Present a Canva slide deck suitable for a general audience which summarizes our findings and documents the results with well-labeled visualizations

Reproduction of this Data:

Can be accomplished by simply cloning this repository and running the final notebook as explained in the instructions below:

Warning If you are a fellow Codeup Alumni and decide to run the alternate version of this project pulling your own SQL from the Codeup database, you will need to pull each table individually and join them locally, otherwise it will timeout.

Reproduction Instructions:

Clone the Repository using this code in your terminal git clone git@github.com:CodeupGourmands/Michelin_NLP_Capstone.git then run the mvp_notebook.ipynb Jupyter Notebook.
You will need to ensure the below listed files, at a minimum, are included in the repo in order to be able to run.
- mvp_notebook.ipynb
- acquire.py
- prepare.py
- explore.py
- model.py

Initial Thoughts

Our initial thoughts are that country, cuisine, and words/groups of words (bigrams and trigrams) may be very impactful features to predict our target 'award' level. Another thought was that the price level and available facilities could also help determine the target 'award' level.

The Plan

Acquire initial data (CSV file) via Kaggle download
Acquire review data using Beautiful Soup via 'get_michelin_pages' function in acquire file
Clean and Prepare the data utilizing RegEx and string functions
Explore data in search of significant relationships to target (Michelin Star Ratings)
Conduct statistical testing as necessary

▪︎ Answer 6 initial exploratory questions:

Question 1. What is the distribution of our target variable (award type)?
Question 2. What countries have the most Michelin restaurants?
Question 3. What is the average wordcount of restaurant reviews, by award type?
Question 4. Do three star Michelin restaurants have the highest sentiment score?
Question 5. What are the most frequent words used in Michelin Restaurant reviews?
Question 6. Do higher rated restaurants have more facilities?

Develop a Model to predict Award Category of Michelin restaurants:
- Evaluate models on train and validate data using accuracy score
- Select the best model based on the smallest difference in the accuracy score on the train and validate sets.
- Evaluate the best model on test data
Draw conclusions

Data Dictionary:

Original Features:

Feature	Description
name	Name of the awardee restaurant
address	Address of the awardee restaurant

QMCBT-JustinEvans / loan_eligibility_prediction