GitHub - khoanguyen123/gacd: Getting and Cleaning Data in R

Getting and Cleaning Data in R

Input data

The experiments used embedded accelerometer and gyroscope in Samsung Galaxy smartphone to measure acceleration and angular velocity while a participant performed various activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING). There were 30 volunteers (i.e. subjects) for the experiments.

The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data

The dataset was downloaded and extracted. Following is the data structure relative to this file's directory:

'README.txt'
'features_info.txt': Shows information about the variables used on the feature vector.
'features.txt': List of all features.
'activity_labels.txt': Links the class labels with their activity name.
'train/X_train.txt': Training set.
'train/y_train.txt': Training labels.
'train/subject_train.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.
'test/X_test.txt': Test set.
'test/y_test.txt': Test labels.
'test/subject_test.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.

There are more files, but we don't use them in this assignment.

Script description

The "run_analysis.R" performs the following tasks:

Read TEST dataset
Read "X_test.txt". Each row is an observation, however the subject and activity columns are missing
So, need to read "subject_test.txt". Each row identifies a subject whose observation is in the matching row number in "X_test.txt"
Similarly, "y_test.txt" identifies an activity by number (e.g. 1 = WALKING, 2 = WALKING_UPSTAIRS, ...). So, we need to map them to more descriptive names using "activity_labels.txt"
Combine them all (via cbind) into a single TEST dataset
Do the same with TRAINING dataset
Merge TEST and TRAINING dataset created in step 1 and 2 into one data frame via rbind
Extract only the measurements on the mean and standard deviation for each row using select{dplyr} function on all columns matching "std" or "mean"
Create a new dataset (hopefully tidy) with the average of each variable (i.e. each of the means and stds above) per activity per participant
The new dataset is written to a file called "result.txt" in current directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data in R

Input data

Script description

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
test		test
train		train
Codebook.md		Codebook.md
README.md		README.md
activity_labels.txt		activity_labels.txt
features.txt		features.txt
features_info.txt		features_info.txt
result.txt		result.txt
run_analysis.R		run_analysis.R

khoanguyen123/gacd

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data in R

Input data

Script description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages