This project uses data of real estate loaning deals to solve 3 problems:
- Probalem 1: Use Google Geocoding API to get City and State names based on Zip Code.
- Probalem 2: Classify top 3 lenders who are most likely to provide the loan.
- Problem 2.1: Build a classification algorithm to predict lenders and their probabilities
- Problem 2.2: Build a web application that uses the algorithm to predict lenders based on user inputs
- Probalem 3: Find the variables that correlate with higher interest rate.
The dataset of ~30K real estate loan deals was given by Enodo
Variables list:
Deal ID
,Lender ID
,Zip
,City
* (blank to be filled),State
* (blank to be filled),Multifamily Subtype
,Built
,Units
,Original Loan
,Note Rate
,Loan Term (Original)
,Appraised Value
*,Maturity Date
,UPB
,Amort DSvc
,Orig Amort
,IO Period
- Features with * were used to solve Problem 2.
- All features were used to solve Problem 3.
-
app
template
master.html
: homepage of web appgo.html
: classification result page of web app
static
js
app.js
: JavaScript file that supportsmaster.html
app.py
: Flask file that runs web apputils.py
: customized functions that supportapp.py
- IMPORTANT: in order to run the web app, please add a
config.py
file that specifies your own Google Geocoding API Keygoogle_api_key = MY-GOOGLE-API-KEY
.
-
data
lender_data.csv
: raw lender data to be processedlender_data1.db
: lender data processed bypart1.ipynb
-
model
model_rf.pkl
: trained classification model that predicts most likely lendersfeatures.pkl
: input features of the classification modelimputer.pkl
: fitted imputer to preprocess data for the classification modelscaler.save
: fitted scaler to preprocess data for the classification model
-
notebook
part1.ipynb
: notebook that solves Problem 1part2.ipynb
: notebook that solves Problem 2.1part3.ipynb
: notebook that solves Problem 3- IMPORTANT: in order to run the notebooks, please add the following
- a
config.py
file that specifies your own Google Geocoding API Keygoogle_api_key = MY-GOOGLE-API-KEY
- a
temp
folder at this level
- a
- Run the following command in the
app
directory to launch the web app:python app.py
. - Go to http://0.0.0.0:3001/.
The code was developed using the Anaconda distribution of Python version 3.6. The following dependencies were used.
pandas
numpy
sqlalchemy
sklearn
flask
matplotlib
seaborn
requests
pickle
-
Problem 1: Retrieve city and state based on zip code
-
Problem 2: Classify top 3 most likely lenders
- Classifier that predicts top 3 lenders and their probabilities: see
part2.ipynb
- Web app
- Classifier that predicts top 3 lenders and their probabilities: see
-
Problem 3: Identify variables that correlate with high interest rate
Some of the lenders and loan term correlate strongly with higher interst rate.