teisha / datasciencecoursera

For Data Scientist Toolbox homework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting And Cleaning Data

Project 1

The purpose of the project was to take files of data generated by wearable devices that subjects wore during exercise, and create one tidy data set from the many files generated.

Code

The function written to process the data is: run_analysis.

The code reads in 6 source files containing the data regarding participants, activities and the data generated from the devices. The source data is also separated between testing and training data; these data sets are all combined to create one final tidy data set.

  • The data is read into the function.
  • Labels are added to the column headings for the measurements. Label names have been cleansed to remove parentheses and dashes.
  • Subject and activity data are merged with the measurements.
  • The measurements for standard deviation and means are pulled out and averaged.
  • The data set with the averaged values for the standard deviations and mean measurements is the one finally passed back to the caller of the function.

Source Files

From the source data given, the following files were used:

  • 'features.txt': List of all features - used to get column names.
  • 'activity_labels.txt': Links the class labels with their activity name.
  • 'train/X_train.txt': Training set, containing the device measurements.
  • 'train/y_train.txt': Training labels, containing the activity labels related to the measurements.
  • 'train/subject_train.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.
  • 'test/X_test.txt': Test set.
  • 'test/y_test.txt': Test labels.
  • 'test/subject_test.txt': Test subjects.

Output

The output file containing the tidy data set is: average_data.csv It contains one record per subject per activity with the average value of each mean and standard deviation measurements in the original file.

About

For Data Scientist Toolbox homework