Skip to content

Latest commit

 

History

History
274 lines (257 loc) · 22.8 KB

Codebook.md

File metadata and controls

274 lines (257 loc) · 22.8 KB

Codebook for Getting and Cleaning Data

Overview

This codebook accompanies the data file tidydata.txt that was created in support of the requirements for the Johns Hopkins University online course Getting and Cleaning Data, offered on Coursera in August 2015.

One of the requirements for the course was to create a tidy data file (for additional background, review the README.md file that is also posted in this GitHub repository.

The tidy data file contains 180 observations, combinations of 30 research subjects and 6 activities, where the measured data consists of the mean across multiple repetitions of an experiment within each category of physical activity. An observation (or row) in the tidy data set is a unique combination of personId and activityName, plus the means for 66 variables representing each of the variables from the original Human Activity Recognition data set that were means or standard deviations of the 33 base variables analyzed by the HAR research team, per the following illustration.

Observations in Tidy Data Set

personIdactivityNamemeanOfTimeBodyAccMeanX. . . meanOfFreqBodyGyroJerkMagStdev
7walking 0.275592961754386. . . -0.0841663774087719
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Although statisticians could assert that it would be more appropriate to summarize the standard deviations into standard errors (see https://class.coursera.org/getdata-031/forum/thread?thread_id=28#post-1251 [and following] for details), for the purposes of the data cleaning activity, the standard deviations have been summarized with the mean() function in R. This codebook describes each variable (column) in the tidy data file.

In all of the measurement variables, the text tokens have the following meanings. We have consolidated the term definitions here to avoid repetition of the definitions in the table of variables.

Token Description
Body Signal based on the body of an experiment participant, one of two components derived from the time based signals on the phone's accelerometer
Freq Measurement based on the "frequency" domain, taken as a Fast Fourier Transform of the time-based signals
Gravity Signal based on gravity, the force that attracts a body towards the center of the earth. Gravity is the second of the two measurement components derived from the phone's accelerometer
Gyro Measurement taken from the gyroscope on the phone
Jerk Measurement of sudden movement, based on body acceleration and angular velocity
Mag Measurement of the magnitude of the Euclidean norm (i.e. length of a vector from the origin) of a three-dimensional signal
Mean Indicates that the measurement is a mean within the original Human Activity Recognition data set
meanOf Indicates that the measurement is a mean in the tidy dataset taken over all experiments for a particular activity for a person for a given feature (variable) from the original Human Activity Recognition data set
Stdev Indicates that the measurement is a standard deviation within the original Human Activity Recognition data set
Time Measurement based on the "time" domain. Measurements taken from the phone were measured at a frequency of 50Hz, meaning 50 discrete measurements per second
X Measurement taken along the "X" dimension of the phone, as in a three-dimensional Cartesian coordinate system of X, Y and Z
Y Measurement taken along the "Y" dimension of the phone, as in a three-dimensional Cartesian coordinate system of X, Y and Z
Z Measurement taken along the "Z" dimension of the phone, as in a three-dimensional Cartesian coordinate system of X, Y and Z

Reference: features_info.txt and features.txt files from A Public Domain Dataset for Human Activity Recognition Using Smartphones.

Per the Human Activity Recognition research team, the original data was organized according to the following process.

The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.

Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).

Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).

These signals were used to estimate variables of the feature vector for each pattern:
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.

Reference: features_info.txt file from A Public Domain Dataset for Human Activity Recognition Using Smartphones.

Finally, in the original data, all features were normalized to a range from -1 to 1, per the README.txt file from A Public Domain Dataset for Human Activity Recognition Using Smartphones. Therefore, each feature (measurement) varies from -1 to 1 across the 10,299 rows of the combined X_test.txt and X_train.txt files.

Variables in the Course Project Tidy Data Set

The following table describes all of the columns in the tidydata.txt file that was created to fulfill the requirements for the Getting and Cleaning Data course on Coursera offered during August 2015. Variable names in the data set are written using camelCase notation to facilitate ease of reading within R code.

Position Column Name Description
1 personId Numeric identifier (a unique sequential number) that indicates the participant or subject of the experiment. The original research study included 30 participants, so this variable has a range of numeric values from 1 - 30. No further information beyond an id number was provided by the original research team.
2 activityName Character string describing one of six different activities that were performed by participants in the experiment, including:
  • Laying
  • Sitting
  • Standing
  • Walking
  • Walking downstairs
  • Walking upstairs
3 meanOfTimeBodyAccMeanX Numeric variable measuring the mean of time domain body acceleration means in X dimension of the phone
4 meanOfTimeBodyAccMeanY Numeric variable measuring the mean of time domain body acceleration means in Y dimension of the phone
5 meanOfTimeBodyAccMeanZ Numeric variable measuring the mean of time domain body acceleration means in Z dimension of the phone
6 meanOfTimeGravityAccMeanX Numeric variable measuring the mean of time domain gravity acceleration means in X dimension of the phone
7 meanOfTimeGravityAccMeanY Numeric variable measuring the mean of time domain gravity acceleration means in Y dimension of the phone
8 meanOfTimeGravityAccMeanZ Numeric variable measuring the mean of time domain gravity acceleration means in Z dimension of the phone
9 meanOfTimeBodyAccJerkMeanX Numeric variable measuring the mean of time domain body acceleration jerk means in X dimension of the phone
10 meanOfTimeBodyAccJerkMeanY Numeric variable measuring the mean of time domain body acceleration jerk means in Y dimension of the phone
11 meanOfTimeBodyAccJerkMeanZ Numeric variable measuring the mean of time domain body acceleration jerk means in Z dimension of the phone
12 meanOfTimeBodyGyroMeanX Numeric variable measuring the mean of time domain body gyroscope means in X dimension of the phone
13 meanOfTimeBodyGyroMeanY Numeric variable measuring the mean of time domain body gyroscope means in Y dimension of the phone
14 meanOfTimeBodyGyroMeanZ Numeric variable measuring the mean of time domain body gyroscope means in Z dimension of the phone
15 meanOfTimeBodyGyroJerkMeanX Numeric variable measuring the mean of time domain body gyroscope jerk means in X dimension of the phone
16 meanOfTimeBodyGyroJerkMeanY Numeric variable measuring the mean of time domain body gyroscope jerk means in Y dimension of the phone
17 meanOfTimeBodyGyroJerkMeanZ Numeric variable measuring the mean of time domain body gyroscope jerk means in Z dimension of the phone
18 meanOfTimeBodyAccMagMean Numeric variable measuring the mean of time domain body acceleration magnitude means
19 meanOfTimeGravityAccMagMean Numeric variable measuring the mean of time domain gravity acceleration magnitude means
20 meanOfTimeBodyAccJerkMagMean Numeric variable measuring the mean of time domain body acceleration jerk magnitude means
21 meanOfTimeBodyGyroMagMean Numeric variable measuring the mean of time domain body gyroscope magnitude means
22 meanOfTimeBodyGyroJerkMagMean Numeric variable measuring the mean of time domain body gyroscope jerk magnitude means
23 meanOfFreqBodyAccMeanX Numeric variable measuring the mean of frequency domain body acceleration means in X dimension of the phone
24 meanOfFreqBodyAccMeanY Numeric variable measuring the mean of frequency domain body acceleration means in Y dimension of the phone
25 meanOfFreqBodyAccMeanZ Numeric variable measuring the mean of frequency domain body acceleration means in Z dimension of the phone
26 meanOfFreqBodyAccJerkMeanX Numeric variable measuring the mean of frequency domain body acceleration jerk means in X dimension of the phone
27 meanOfFreqBodyAccJerkMeanY Numeric variable measuring the mean of frequency domain body acceleration jerk means in Y dimension of the phone
28 meanOfFreqBodyAccJerkMeanZ Numeric variable measuring the mean of frequency domain body acceleration jerk means in Z dimension of the phone
29 meanOfFreqBodyGyroMeanX Numeric variable measuring the mean of frequency domain body gyroscope means in X dimension of the phone
30 meanOfFreqBodyGyroMeanY Numeric variable measuring the mean of frequency domain body gyroscope means in Y dimension of the phone
31 meanOfFreqBodyGyroMeanZ Numeric variable measuring the mean of frequency domain body gyroscope means in Z dimension of the phone
32 meanOfFreqBodyAccMagMean Numeric variable measuring the mean of frequency domain body acceleration magnitude means
33 meanOfFreqBodyAccJerkMagMean Numeric variable measuring the mean of frequency domain body acceleration jerk magnitude means
34 meanOfFreqBodyGyroMagMean Numeric variable measuring the mean of frequency domain body gyroscope magnitude means
35 meanOfFreqBodyGyroJerkMagMean Numeric variable measuring the mean of frequency domain body gyroscope jerk magnitude means
36 meanOfTimeBodyAccStdevX Numeric variable measuring the mean of time domain body acceleration standard deviations in X dimension of the phone
37 meanOfTimeBodyAccStdevY Numeric variable measuring the mean of time domain body acceleration standard deviations in Y dimension of the phone
38 meanOfTimeBodyAccStdevZ Numeric variable measuring the mean of time domain body acceleration standard deviations in Z dimension of the phone
39 meanOfTimeGravityAccStdevX Numeric variable measuring the mean of time domain gravity acceleration standard deviations in X dimension of the phone
40 meanOfTimeGravityAccStdevY Numeric variable measuring the mean of time domain gravity acceleration standard deviations in Y dimension of the phone
41 meanOfTimeGravityAccStdevZ Numeric variable measuring the mean of time domain gravity acceleration standard deviations in Z dimension of the phone
42 meanOfTimeBodyAccJerkStdevX Numeric variable measuring the mean of time domain body acceleration jerk standard deviations in X dimension of the phone
43 meanOfTimeBodyAccJerkStdevY Numeric variable measuring the mean of time domain body acceleration jerk standard deviations in Y dimension of the phone
44 meanOfTimeBodyAccJerkStdevZ Numeric variable measuring the mean of time domain body acceleration jerk standard deviations in Z dimension of the phone
45 meanOfTimeBodyGyroStdevX Numeric variable measuring the mean of time domain body gyroscope standard deviations in X dimension of the phone
46 meanOfTimeBodyGyroStdevY Numeric variable measuring the mean of time domain body gyroscope standard deviations in Y dimension of the phone
47 meanOfTimeBodyGyroStdevZ Numeric variable measuring the mean of time domain body gyroscope standard deviations in Z dimension of the phone
48 meanOfTimeBodyGyroJerkStdevX Numeric variable measuring the mean of time domain body gyroscope jerk standard deviations in X dimension of the phone
49 meanOfTimeBodyGyroJerkStdevY Numeric variable measuring the mean of time domain body gyroscope jerk standard deviations in Y dimension of the phone
50 meanOfTimeBodyGyroJerkStdevZ Numeric variable measuring the mean of time domain body gyroscope jerk standard deviations in Z dimension of the phone
51 meanOfTimeBodyAccMagStdev Numeric variable measuring the mean of time domain body acceleration magnitude standard deviations
52 meanOfTimeGravityAccMagStdev Numeric variable measuring the mean of time domain gravity acceleration magnitude standard deviation
53 meanOfTimeBodyAccJerkMagStdev Numeric variable measuring the mean of time domain body acceleration jerk magnitude standard deviation
54 meanOfTimeBodyGyroMagStdev Numeric variable measuring the mean of time domain body gyroscope magnitude standard deviations
55 meanOfTimeBodyGyroJerkMagStdev Numeric variable measuring the mean of time domain body gyroscope jerk magnitude standard deviations
56 meanOfFreqBodyAccStdevX Numeric variable measuring the mean of frequency domain body acceleration standard deviations in X dimension of the phone
57 meanOfFreqBodyAccStdevY Numeric variable measuring the mean of frequency domain body acceleration standard deviations in Y dimension of the phone
58 meanOfFreqBodyAccStdevZ Numeric variable measuring the mean of frequency domain body acceleration standard deviations in Z dimension of the phone
59 meanOfFreqBodyAccJerkStdevX Numeric variable measuring the mean of frequency domain body acceleration jerk standard deviations in X dimension of the phone
60 meanOfFreqBodyAccJerkStdevY Numeric variable measuring the mean of frequency domain body acceleration jerk standard deviations in Y dimension of the phone
61 meanOfFreqBodyAccJerkStdevZ Numeric variable measuring the mean of frequency domain body acceleration jerk standard deviations in Z dimension of the phone
62 meanOfFreqBodyGyroStdevX Numeric variable measuring the mean of frequency domain body gyroscope standard deviations in X dimension of the phone
63 meanOfFreqBodyGyroStdevY Numeric variable measuring the mean of frequency domain body gyroscope standard deviations in Y dimension of the phone
64 meanOfFreqBodyGyroStdevZ Numeric variable measuring the mean of frequency domain body gyroscope standard deviations in Z dimension of the phone
65 meanOfFreqBodyAccMagStdev Numeric variable measuring the mean of frequency domain body acceleration magnitude standard deviations
66 meanOfFreqBodyAccJerkMagStdev Numeric variable measuring the mean of frequency domain body acceleration jerk magnitude standard deviations
67 meanOfFreqBodyGyroMagStdev Numeric variable measuring the mean of frequency domain body gyroscope magnitude standard deviations
68 meanOfFreqBodyGyroJerkMagStdev Numeric variable measuring the mean of frequency domain body gyroscope jerk magnitude standard deviations