Course project for the Johns Hopkins Getting and Cleaning Data course on Coursera.
The run_analysis.R script tidies and summerizes data from the Human Activity Recognition Using Smartphones Data Set. The resulting tidy data set is written to a file named tidydata.txt in the same directory as the script.
The data set archive file is expected to extracted into the same directory as the run_analysis.R script.
The run_analysis.R script:
- Loads the features domain table, which maps feature id numbers to feature names.
- Produces a logical vector indicating which features should be loaded and retained. The logical vector will be TRUE for features whose names contain the text "mean()" or "std()".
- Uses the logical vector from step 2 as a mask to obtain the names of the features to be loaded.
- Uses the logical vector from step 2 again to produce a list of widths for use with
read.fwf()
. Negative widths are used for features that do not need to be loaded, becauseread.fwf()
will drop these columns when loading the file. - Loads the training measurements, label list, and subject list files and combines them into a single data frame with
cbind()
. - Loads the testing measurements, label list, and subject list files and combines them into a single data frame with
cbind()
. - Combines the training and testing data frames with
rbind()
. - Loads the activities domain table, which maps activity id numbers to activity names.
- Updates the activity column of the combined data data frame from step 7, replacing activity id numbers with a factor or descriptive activity names.
- Uses
aggregate()
to produce a data frame with the mean of each feature, grouped by subject number and activity. - Writes the data from from step 10 to file tidydata.txt.