StephRoark / Getting_Cleaning_Data

repo for class project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

#README.md

##Tidy Dataset from the Human Activity Recognition Using Smartphones Dataset

The Tidy Dataset describes the participants of an experiment who each performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. These Subjects where a group of 30 volunteers who were between the ages of 19 and 48 years old. The Data captured for each subject where 3-axial linear accelerations and 3-axial angular velocities at a constant rate of 50Hz using the embedded accelerometer and gyroscope.

##Generate the Tidy Data file

With the original raw data file, "getdata-projectfiles-UCI HAR Dataset.zip", in your working directory, run the run-analysis.R file to generate the "tidy_data.txt" file.

##Description of the Dataset

For each subject, these data are included:

  • A 68-feature dataframe with time and frequency domain variables.
  • Its activity label.
  • An identifier of the subject who carried out the experiment.

Features are normalized and bounded within [-1,1].

  • The units used for the accelerations (total and body) are 'g's (gravity of earth -> 9.80665 m/seg2).
  • The gyroscope units are rad/seg.

The Tidy Data text file consists of the original raw data files merged and analyzed to create a final data set which has 68 unique variables and 180 different observations of those variables.

The tidy data definition:

  1. Each variable forms a column
  2. Each observation forms a row
  3. Each data set contains information on only one observational unit of analysis

##The Dataset includes the following files:

The README.md

The CodeBook.md contains the description of each variable in the tidy data set along with any units of measurement not captured in the variable names.

The UCI HAR Dataset zip files is the original raw file from the Human Activity Recognition Using Smartphones Dataset.

The Tidy Dataset is the file generated by performing the analysis in the run_analysis.R code file.

The Run Analysis file contains the code to perform the analysis on the raw data and generate the Tidy Data text file that meets the principles of Tidy Data.

##Data Analysis Summary

The purpose of the analysis is to create an independent tidy data set with the average of each variable for each activity and each subject.

The UCI HAR data set is unzipped and the test and train files for X, Y, and subject, as well as the features and activity files, are read into the working directory. For both the test and train files, the activity labels are joined with the activities performed by the subjects. After removing the first column in the resulting table, the activities are renamed from numerals using the descriptive activity labels. The columns of the X data are renamed using the features.txt file.

The test and train files are then column bound to create a merged test and merged train files which include 561 variables and 2947 and 7352 observations. These test and train tables are then row bound together to create a final data set of 561 variables and 10299 observations.

From this data set, the variables containing the data for mean and standard deviation are selected. The data set is grouped by Subject and then by activity and the average for each of these variables are recorded. The resulting data set containes 68 variables and 180 observations, which come from the 30 subjects each performing 6 activities.

Finally, the variable names are expanded and simplified for the user's ease of understanding. This tidy data set is then written to a text file and read in and openned for viewing.

About

repo for class project


Languages

Language:HTML 99.4%Language:R 0.6%