Authors: Kevin Liao, Thomas Y Sun, Peng Cheng
Berkeley CA, 94072 USA
Kevin Liao: lwk723@berkeley.edu
This repository is mainly about Course Stats 159 at UC Berkeley, fall 2016 - second project. This project follows the ideas presented chapter 6: Linear Model Selection and Regularization (from "An Introduction to Statistical Learning" by James et al) and performs a predictive modeling process applied on the data set Credit
. We will carry out results and comparison between different predictive models and details the project in report paper.
Instruction: github project repository
Course website: gastonsanchez.com/stat159
The main directories of this repository are:
data
, which stores the original data setCredit.csv
, the mean centralized and scaled data setscaled-credit.csv
and some other RData output from our analysisCode
, which holds the codes for all analysis/computations and containes three main directories:- functions, which contains generic functions used in scripts
- scripts, which is the main folder for all data process and model analysis
- tests, which holds unit tests for output comparison
images
, which stores the graphic output including histogram, boxplot, correlation matrix and barcharts etc.report
, which is sectioned into 7 parts and produces the official project report and analysisslides
, which adds on additional feature to the project and complements the materials in the report for a formal presentation
The complete file-structure for the project is as follows:
stat159-fall2016-project2/
README.md
Makefile
LICENSE
session-info.txt
.gitignore
code/
README.md
test-that.R
functions/
mse-function.R
scripts/
eda-qualitative-script.R
eda-quantitative-script.R
pre-process-script.R
ols-regression-script.R
ridge-regression-script.R
lasso-regression-script.R
PCR-script.R
PLSR-script.R
session-info.R
tests/
test-mse.R
data/
README.md
Credit.csv
eda-qualitative-output.txt
eda-quantitative-output.txt
correlation-matrix.RData
scaled-credit.csv
train-and-test-set.RData
ols-regression-output.txt
ols-regression.RData
ridge-regression-output.txt
ridge-regression.RData
lasso-regression-output.txt
lasso-regression.RData
PCR-output.txt
PCR.RData
PLSR-output.txt
PLSR.RData
images/
README.md
barchart-Ethnicity.png
barchart-gender.png
barchart-married.png
barchart-student.png
boxplot-age.png
boxplot-balance-ethnicity.png
boxplot-balance-gender.png
boxplot-balance-married.png
boxplot-balance-student.png
boxplot-balance.png
boxplot-cards.png
boxplot-education.png
boxplot-income.png
boxplot-limit.png
boxplot-rating.png
histogram-age.png
histogram-balance.png
histogram-cards.png
histogram-education.png
histogram-income.png
histogram-limit.png
histogram-rating.png
ridge-cv-lambda.png
lasso-cv-lambda.png
pcr-cv-ncomp.png
plsr-cv-ncomp.png
scatterplot-matrix.png
report/
README.md
report.pdf
report.Rmd
sections/
00-abstract.Rmd
01-introduction.Rmd
02-data.Rmd
03-methods.Rmd
04-analysis.Rmd
05-results.Rmd
06-conclusions.Rmd
slides/
predictive-modeling-slides.Rmd
predictive-modeling-slides.html
- Abstract - provide basic information of the project
- Introduction - provide the objective of the project
- Data - explain the source and use of our raw data
- Methods - OSL, ridge, lasso, PCR, PLSR
- Analysis - train and test models for each method
- Results - display findings of the project
- Conclusions - draw conclusion from the research results
- References
The objective of the project is to practice how to use different methods to do model selection based on given data set.
If you want to reproduce the results represented in this project (images, dataset, report etc), one simply clone the repository (download zip file) and run the make file with command
make
If you would like to reproduce a specific part of the project (e.g. regressions), run the corresponding command line in the terminal
make regressions
If you would like to reproduce the report, run the corresponding command line in the terminal
make report
If you would like to remove the report, run the following command line
make clean
the following is a complete list of make commands for phony targets:
- make all
- make data
- make tests
- make eda
- make pre
- make ols
- make ridge
- make lasso
- make pcr
- make plsr
- make regressions
- make report
- make slides
- make session
- make clean
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. New York: Springer, 2013. Print.
All media content (e.g. report/report, and images) licensed under Creative Commons Attribution-ShareAlike 4.0 International License.
All code licensed under Apache License 2.0
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Author: Kevin Liao, Thomas Y Sun, Peng Cheng