This project is my solution to the Course Project for Coursera's "Getting and Cleaning Data" course.
- This
README.md
file is a general description of the contents. - The
CODEBOOK.md
file is a description of the data culled and collated into thetidy_data.txt
file, which is the output of therun_analysis.R
file if sourced into R. - All remaining files are the data, as originally formatted and structured, from the following link.
If one were to clone this repository, delete the tidy_data.txt
file, open R Studio, set the working directory to the repo, and source run_analysis.R
, they would find a fresh copy of tidy_data.txt
in the working directory.
- In an attempt to follow the steps given in the assignment, the
run_analysis.R
file first loads the data from both thetrain
andtest
folders andmerge
s them into a single data frame. - It then gets the column labels from the
features.txt
file and sets the resultant vector tonames(data)
. With column names sorted, it uses a combination ofgrep
commands to subset the data to columns containing either mean or standard deviation information. - The code next pulls the activity types from the
train/y_train/txt
andtest/y_test.txt
files, combines them as the previous data, and makes the column a factor with labels given by theactivity_lables.txt
file. - Though column names were sorted while dealing with Step 2, we'll add here that the code also adds a
Subject
factor column using thetrain/subject_train.txt
andtest/subject_test.txt
files,merge
ing them as before. - Finally, we use an
aggregate
command on the non-factor columns by the factor columns, running the functionmean
to get the means in each Activity/Subject factor pair. This data is then written to thetidy_data.txt
file as output after the columns theaggregate
function adds are appropriately renamed.
I hope it meets your approval. Best wishes, and good luck with the rest of the Data Science sequence!