Repository created for a main project of 'Getting and Cleaning Data' class at Coursera.
Tidy dataset can be found in a file tidy_dataset.txt. It is a result of merging two original datasets (training and test), limiting the number of variables in them and summarizing them by calculating average values of means and standard deviations of features in the raw data set.
Raw data for this exercise has been obtained from UCI Machine Learning Repository. Link to the original data set can be found here.
More information about the original dataset can be found in a CodeBook.md file.
- Tidy dataset: tidy_dataset.txt
- Script generating tidy dataset: run_analysis.txt
- Readme file: README.md
- Codebook: CodeBook.md
run_analysis.R script:
- Downloads the raw dataset (if not already downloaded)
- Unsips the raw dataset (if not already unzipped)
- Loads the [training](UCI HAR Dataset/train/X_train.txt) and [test](UCI HAR Dataset/test/X_test.txt) datasets to workspace
- Loads the [feature list](UCI HAR Dataset/features.txt) to the workspace
- Creates a logical vector with value TRUE, where feature name contains string 'mean()' or 'std()'
- Limits the training and dest datasets to only those variables of which position matches the position of TRUE values in the logical vector from point 5.
- Loads to the workspace the subject and acticity variables from subject_train.txt, subject_test.txt, y_train.txt and y_test.txt files.
- Loads the descriptive activity labels from file [activity_labels.txt](UCI HAR Dataset/activity_labels.txt)
- Recodes the activity variables loaded in point 7 to descriptive labels (using mapvalues function from plyr package).
- Adds subject and activity data to training and test datasets.
- Combines training and test datasets into full dataset.
- Metls the full dataset using melt function from reshape2 package.
- Summarizes non-id variables by calculating their mean across subjects and activities using dcast function (reshape2) library.
- Writes the tidy dataset to the tidy_dataset.txt file.