Original data comes from "Human Activity Recognition Using Smartphones Data Set". In order to run the script in the repo you have to dowload this data.
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
#Steps
In this step is important that you set the right working directories where the objects are.
Load the "features.txt" that will be used to name the 561 data measurents.
Given that the complete data comes divided into 2 groups (test and train) it is necessary to load both. Here we load into R objects "X_test.txt" (data froma the 561 measurements), "Y_test.txt" (activity identificator for the 6 activities recorded) and "subject_test" (subject identificator for the 30 participants in the study). With read.table we use col.names argument to name with the column of feature the measurements. There are 2947 observations in this dataset.
The same as the last step, but for the train data. There are 7352 observations in this dataset.
In this step the columns of activities and subjects are added to train and test data. Then the complete data is created.
The columns of subject are added to test and train data.
The columns of activity are added to test and train data.
Train and test data (with activity and subject columns) are merged. A dataset of 10299 observations is obtained.
Here we find the columns wich have "mean" or "std" in order to subset the measurements of mean and standard deviation.
This is done by searching with grep into the features column with measurements variable names. Then a vector with the position is created.
3.2 Creating a data set with variables "subject", "activity" and those with mean and std in its name
A subset of the completedata is created with the vector with mean and std positions plus "activity" and "subject" columns. This subset has 10299 observations of 81 variables.
The value of the activities is recoded with the activity name.
For having a tidy dataset the variable names are edited with no dots inbetween and just with capital letters at the beggining of word for readability.
A tidy dataset with the average of mean and standard deviation measurements for each subject and activity is created with aggregate. Since there are 6 activities and 30 subjects, a 180 observations by 81 measurements is obtained.
The tidy dataset is exported as a .txt file. Is important to set the desired working directory where you want to find the file.