In this assignment, I create two R scripts help me to run model with different parameters, and save the results to different directories for me to pickup the model with better accuracy
- index.Rmd: R markdown shows codes training model to predict the 20 testing cases
- compiled html from R markdwon
- WLEWriteUp.R script used to to build the model for this assignment
- cleanData function(data) return cleanData
- keep measurement value and classe only
- romove column with missing values
- uses nearZeroVar to remove nzv, zeroVar columns
- probeData function(cleanData, prob) return pData
- creates train/test data set by Data Partition with p = 0.7
- runRFModel function(data = pData, modDir = "NZV", ...) return modFit
- train model modFit by data$training
- predict data$testing by modFit
- create confusionMatrix by testing$classe and predicted value
- save confusionMatrix$overall , confusionMatrix$byClass
- plot the frequency of predict v.s. actual and save image
- pml_write_files = function(x, modDir)
- write prediction of 20 new testing cases with assignment format
- RunningModels.R script used to train model with different parameters, save confusionMatrix, and predict 20 new test cases and writes prediction result for this assignment
- cData <- cleanData(data = trainData)
- pData <- probeData(cData, 0.7)
- modFit1 <- runRFModel(data = pData, modDir = "parRF70/", method = "parRF")
- testPred <- predict(modFit, testCase)
- pml_write_files(testPred, modDir= "RPART70/")
- modFit.RData: a list of training model
- mod: training model by Parallel Random Forest
- cfMatrix: confusion matrix
- cfplot: confusion matrix plot