Requirements:
Uses data obtained from the URL below: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Using data abtained from the URL above, demonstrate your ability to collect, work with, and clean a data set by creating on R script called run_analysis.R that does the following:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive activity names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
run_analysis.R Process Notes:
- Read the training files: x_train.txt, y_train.txt, and subject_train.txt from the data folder and save to temporary tables
- Read the test file: x_test.txt, y_test.txt, and subject_test.txt from the data folder and save to temporary tables
- Read the activity_labels.txt file from the data folder and save to a temporary table
- Merge the corresponding training and test data sets together into three tables:
- data values
- labels
- subjects
- Read the features.txt file to a temporary table
- From the temporary features table, subset only those lines that include mean or std measures
- Using the results from step#6, above, subset the merged data to only include rows that have mean or std measures
- Use the index of the subset to create column headings.
- Clean up the column headings by removing parenthesis, dashes, and by putting them in camelcase format
- Using the activity labels read in step#3 above, add activity labels to the dataset and clean them up by removing underscores and setting to camelcase format.
- Move the cleaned up activity labels over to the merged labels table.
- Set column heading for merged subjects file to "subject"
- Set column heading for merged labels file to "activity"
- Merge all sets together to form one temporary master file
- Using the reshape2 library, melt the temporary dataset created in step#14 above to calculate mean for each subject,activity combination - and put it in a final tidyData dataset.
- Because sort order of activity was lost during melt, force a resort by subject and then activity per the cleaned up activity labels created in step#10 above.
- Write final tidy dataset to working directory as "tidyData.txt"