Authors: Minsu Kim, Lingjie Qiao, Kevin Liao, Cheng Peng
Berkeley CA, 94072 USA
Minsu Kim: kaj011@berkeley.edu
Lingjie Qiao: katherine_qiao@berkeley.edu
Kevin Liao: lwk723@berkeley.edu
Cheng Peng: hanson.peng@berkeley.edu
This repository holds the information of Course Stats 159 at UC Berkeley, fall 2016 – final project. This project aims to estimate housing price based on 79 variables covering every aspect of residential homes in Ames, Iowa with statistical models. Two main focuses are:
- Incorporate advanced machine learning techniques and predictive model building to solve real-life industry problems
- Demonstrate aptitude in research and data analysis by emphasizing computational reproducibility and project collaboration
Project Instruction: github project repository
Course website: gastonsanchez.com/stat159
Since the main deliverables of this project include report, slides, shinyApp and related data in the process, we create the following repository structure to better organize the files for the purpose of reproducibility.
The main directories of this repository are:
data
, which stores the original data set, the preprocessed and scaled data set, and some other RData outputcode
, which holds the codes for all analysis/computations and containes three main directories:function
, which contains generic functions used in scriptsscript
, which is the main folder for all regression model processingtest
, which holds unit tests for output comparison
images
, which stores the graphic output including histogram, boxplot, correlation matrix and barcharts, as well as the banner of project etc.report
, which has 7 sections and produced with latex formatslides
, which adds on additional feature to the project and complements the materials in the report for a formal presentationshinyApp
, which creates a shiny App for data visualization and interactive process walk-throughsubmission
, which holds the 16 submissions made to Kaggle Competition
The complete file-structure for the directory is as follows:
stat159-fall2016-finalproject/
README.md
Makefile
LICENSE
session-info.txt
.gitignore
code/
README.md
function/
qualitative_analysis.R # For exploratory data analysis
quantitative_analysis.R # For exploratory data analysis
util.R # All util functions
script/
python/ # contains the original python code
model/ # contains the transformed R code for each predictive model
gbm.R
lasso.R
ridge.R
pca.R
randomforest.R
svm.R
xgboost.R
lingjie-eda.R
daniel-eda.R
kevin-eda.R
data-preparation.R
preprocess.R
model-analysis.R # the main model analysis and comparison file
seesion-info-script.R
test/
test-evaluation.R
qualitative_output.txt # output from eda script
data/
README.md
rawData/ # downloaded from Kaggle website
train.csv
test.csv
sample_submission.csv
data_description.txt
cleanedData/
data.all.matrix.RData
data.all.RData
ddata_train_validation.matrix.RData
RMSEL_Table.RData
model/
gbm.RData
lasso.RData
ridge.RData
pca.RData
rf.RData
images/ # which holds over 80 png image files
report/
report.pdf
report.Rmd
sections/
slides/
README.md
slides.R
slides.html
shinyApp/
README.md
app.R # main shinyApp file
submission/ # which holds 16 submissions made to Kaggle Competition
- Abstract
- Introduction
- Data
- Exploratory Data Analysis
- Methodology
- Analysis
- Results
- Conclusions
- Acknowledgement
- References
To reproduce most of the results represented in this project (images, dataset, report etc), simply clone the repository (download zip file) and run the make file with command
make
If you would like to reproduce a specific section (for example, the report), run the corresponding command line in the terminal
make report
If you would like to remove the report, run the following command line
make clean
If you would like to know how we obstained the 16 submissions, please feel free to contact the owner of this repository for more information
the following is a complete list of make commands for phony targets:
- make all
- make data
- make tests
- make eda
- make pre
- make regressions
- make report
- make slides
- make session
- make clean
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Author: Minsu Kim, Lingjie Qiao, Kevin Liao, and Cheng Peng