Getting And Cleaning Data Project - JH - Coursera

================================================================== Human Activity Recognition Using Smartphones Dataset Version 1.1

The file ‘run_analysis.R’ summarizes the provided dataset and produces a smaller aggregated dataset for each subject and each activity. The script does the following steps:

1- Loads the training dataset sourced from 3 files (‘subject_train.txt’, ‘y_train.txt’, ‘X_train.txt’), where ‘X_train.txt’ has 561 columns representing the features in the file ‘features.txt’).

2- Loads the test dataset sourced from 3 files (‘subject_test.txt’, ‘y_test.txt’, ‘X_test.txt’), where ‘X_train.txt’ has 561 columns representing the features in the file ‘features.txt’).

3- Loads feature descriptions from the file ‘features.txt’ in order to determine the ones that represent mean and standard deviation values by searching for strings “mean” and “std”.

4- Select only the relevant features found in step 3 by selecting the respective columns in the training and test data set.

5- Set the names of the columns in both datasets to match the feature short description as read from the file “features.txt”.

6- Merge the selected subset of the training dataset and test dataset together by binding the rows.

7- Aggregate the values of the selected subset of data over subject and activity. We have 30 subjects and 6 activities; therefore, we expect 30*6 = 180 columns.

8- Removes the unnecessary grouping columns introduced by the aggregate function.

9- Loads the activity labels look up table from the file ‘activity_labels.txt’ in order to replace the activity_id with the descriptive name of the activity (e.g. WALKING).

10- Removes the activity_id column since the activity_name was introduced and is more descriptive. Note: This step can be left out in case we still need to keep the activity id in the dataset.

11- Finally, writes the resulting dataset to a text file.

The following files are included in the repository:

‘Codebook.pdf’ A short description of the columns of the new summarized dataset.
‘Dataset.txt’ The tidy data set generated by the ‘run_analysis.R’ script. Data is written as a text file with the function write.table().
‘run_analysis.R’ The analysis script used to generated the data in the ‘dataset.txt’ file.

Note: The original version of the README.txt file is also attached below. In this previous section, only the modifications made by the introduced ‘run_analysis.R’ script are described. For more information and details, please refer to the original README file below.

================================================================== Human Activity Recognition Using Smartphones Dataset Version 1.0

Jorge L. Reyes-Ortiz, Davide Anguita, Alessandro Ghio, Luca Oneto. Smartlab - Non Linear Complex Systems Laboratory DITEN - Universit‡ degli Studi di Genova. Via Opera Pia 11A, I-16145, Genoa, Italy. activityrecognition@smartlab.ws www.smartlab.ws

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain. See 'features_info.txt' for more details.

For each record it is provided:

Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
Triaxial Angular velocity from the gyroscope.
A 561-feature vector with time and frequency domain variables.
Its activity label.
An identifier of the subject who carried out the experiment.

The dataset includes the following files:

'README.txt'
'features_info.txt': Shows information about the variables used on the feature vector.
'features.txt': List of all features.
'activity_labels.txt': Links the class labels with their activity name.
'train/X_train.txt': Training set.
'train/y_train.txt': Training labels.
'test/X_test.txt': Test set.
'test/y_test.txt': Test labels.

The following files are available for the train and test data. Their descriptions are equivalent.

'train/subject_train.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.
'train/Inertial Signals/total_acc_x_train.txt': The acceleration signal from the smartphone accelerometer X axis in standard gravity units 'g'. Every row shows a 128 element vector. The same description applies for the 'total_acc_x_train.txt' and 'total_acc_z_train.txt' files for the Y and Z axis.
'train/Inertial Signals/body_acc_x_train.txt': The body acceleration signal obtained by subtracting the gravity from the total acceleration.
'train/Inertial Signals/body_gyro_x_train.txt': The angular velocity vector measured by the gyroscope for each window sample. The units are radians/second.

Notes:

Features are normalized and bounded within [-1,1].
Each feature vector is a row on the text file.

For more information about this dataset contact: activityrecognition@smartlab.ws

License:

Use of this dataset in publications must be acknowledged by referencing the following publication [1]

[1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

This dataset is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.

Jorge L. Reyes-Ortiz, Alessandro Ghio, Luca Oneto, Davide Anguita. November 2012.

omsh / GettingAndCleaningDataProject