ameilij / PracticalMachineLearning

COURSERA Practical Machine Learning Class

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Practical Machine Learning

![JHU](https://github.com/ameilij/PracticalMachineLearning/blob/master/JHU-Logo-Square-Mini_180px.png)

COURSERA Practical Machine Learning Class | Johns Hopkins University

This is my repository for all code examples, homework and investigation of Machine Learning concerning COURSERA Johns Hopkins University Practical Machine Learning class. The following files are available for all to use.

BasicAnalysis.R This is a very simple example of what happpens when you overfit a model. It's the first introduction to prediction and why sometimes too many rules can ruin the predictor.

BasicPreprocessing.R While it is tempting to start training a predictor from the get go, most datasets will require some preprocessing. Maybe the data is skewed, or maybe some data needs imputation. Many good examples to follow on stardardizing and such.

BasicTreeCalssification.R Classifying with trees is a fun algorithm for predicting if an entity belongs or not to a class. These methods are better for predictions in non-linear environments.

BootstrapAggregating.R Bootstrap aggregation, also known as bagging, is a Machine Learning method based on a few basic principles:

  • Resample cases and recalculate predictions
  • Average results or majority vote

This set of code has accompanying Rmd files for easy note taking.

Boosting The basic idea behind boosting is to take lots of possibly weak predictors, then we weight them and add them up. Thus we get a stronger one. We could see two clear steps:

A. Start with a set of classifiers ($h_1$, $h_2$, ..., $h_n$). For example, these could be all possible trees, all possible regressions, etc.

B. Create a classifier that combines classification functions

$f(x) = sgn(\sum_{i=1}^t \alpha_i h_+ (X))$

This set of code has accompanying Rmd files for easy note taking.

BoxCox.R BoxCox transformation takes continuous data and tries to mimic normal data. It does so estimating a set of parameters using maximum likelihood. Maybe for the uber-Statistician but handy when needed.

caretDataSlicing.R Samples on data slicing with the Caret package.

caretExample1.R First entry point for very simple creation of test and training data sets.

ensemblingMethods Combining predictors - or ensembling methods - is a method in which we combine classifiers by voting and/or averaging. Combining classifiers improves accuracy but reduces interpretability. Boosting, bagging, and random forests are variants on this theme. For example, the Netflix price was won by combining 107 predictors.

imputingData.R Examples of how to properly impute data so your predictors don't suffer.

MLMultipleCovariates.R More advanced code samples of using ML with multiple covariates. Includes official Johns Hopkins solution plus my own code on using stepAIC (a rather controversial yet useful function for multiple covariate selection) for building multiple covariate predictor formulas. Used with training and testing sets, you can verify on your own using cor(x,y) check.

MLRegression1.R Machine learning example with multiple regression using one covariate. Easy to follow, yet a lot to learn!

RandomForests.R Random Forests are an extension of Bootstrap Aggregating. The basic idea behind is very simple.

  1. Bootstrap samples
  2. At each split, bootstrap variables
  3. Grow multiple trees and vote

This is both an R file with code and an Rmd file with easy to follow reproducible code and notes.

pcaBase.R Good example of using PCA (principal component analysis) to reduce two very similar predictors into one without losing prediction capacity.

plottingPredictors.R Good examples of using plots on training sets to guide predictors through EDA (Explorative Data Analysis.)

tidyCovariates.R More necessary for some methods - regression and SVM's - than others, tidy covariates are new covariates to simplify prediction formulas. Good examples in here!

trainingOptions.R Simplest creation of training data template so far in this repository.

More files will be added in the future. I hope this helps you get started in Machine Learning as it has helped me =)

Panama | August 2016

About

COURSERA Practical Machine Learning Class


Languages

Language:HTML 97.5%Language:R 2.5%