This codebook accompanies the data file tidydata.txt that was created in support of the requirements for the Johns Hopkins University online course Getting and Cleaning Data, offered on Coursera in August 2015.
One of the requirements for the course was to create a tidy data file (for additional background, review the README.md file that is also posted in this GitHub repository.
The tidy data file contains 180 observations, combinations of 30 research subjects and 6 activities, where the measured data consists of the mean across multiple repetitions of an experiment within each category of physical activity. An observation (or row) in the tidy data set is a unique combination of personId and activityName, plus the means for 66 variables representing each of the variables from the original Human Activity Recognition data set that were means or standard deviations of the 33 base variables analyzed by the HAR research team, per the following illustration.
personId | activityName | meanOfTimeBodyAccMeanX | . . . | meanOfFreqBodyGyroJerkMagStdev |
---|---|---|---|---|
7 | walking | 0.275592961754386 | . . . | -0.0841663774087719 |
. . . | . . . | . . . | . . . | . . . |
Although statisticians could assert that it would be more appropriate to summarize the standard deviations into standard errors (see https://class.coursera.org/getdata-031/forum/thread?thread_id=28#post-1251 [and following] for details), for the purposes of the data cleaning activity, the standard deviations have been summarized with the mean() function in R. This codebook describes each variable (column) in the tidy data file.
In all of the measurement variables, the text tokens have the following meanings. We have consolidated the term definitions here to avoid repetition of the definitions in the table of variables.
Token | Description |
---|---|
Body | Signal based on the body of an experiment participant, one of two components derived from the time based signals on the phone's accelerometer |
Freq | Measurement based on the "frequency" domain, taken as a Fast Fourier Transform of the time-based signals |
Gravity | Signal based on gravity, the force that attracts a body towards the center of the earth. Gravity is the second of the two measurement components derived from the phone's accelerometer |
Gyro | Measurement taken from the gyroscope on the phone |
Jerk | Measurement of sudden movement, based on body acceleration and angular velocity |
Mag | Measurement of the magnitude of the Euclidean norm (i.e. length of a vector from the origin) of a three-dimensional signal |
Mean | Indicates that the measurement is a mean within the original Human Activity Recognition data set |
meanOf | Indicates that the measurement is a mean in the tidy dataset taken over all experiments for a particular activity for a person for a given feature (variable) from the original Human Activity Recognition data set |
Stdev | Indicates that the measurement is a standard deviation within the original Human Activity Recognition data set |
Time | Measurement based on the "time" domain. Measurements taken from the phone were measured at a frequency of 50Hz, meaning 50 discrete measurements per second |
X | Measurement taken along the "X" dimension of the phone, as in a three-dimensional Cartesian coordinate system of X, Y and Z |
Y | Measurement taken along the "Y" dimension of the phone, as in a three-dimensional Cartesian coordinate system of X, Y and Z |
Z | Measurement taken along the "Z" dimension of the phone, as in a three-dimensional Cartesian coordinate system of X, Y and Z |
Reference: features_info.txt and features.txt files from A Public Domain Dataset for Human Activity Recognition Using Smartphones.
Per the Human Activity Recognition research team, the original data was organized according to the following process.
The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ. These time domain signals (prefix 't' to denote time) were captured at a constant rate of 50 Hz. Then they were filtered using a median filter and a 3rd order low pass Butterworth filter with a corner frequency of 20 Hz to remove noise. Similarly, the acceleration signal was then separated into body and gravity acceleration signals (tBodyAcc-XYZ and tGravityAcc-XYZ) using another low pass Butterworth filter with a corner frequency of 0.3 Hz.
Subsequently, the body linear acceleration and angular velocity were derived in time to obtain Jerk signals (tBodyAccJerk-XYZ and tBodyGyroJerk-XYZ). Also the magnitude of these three-dimensional signals were calculated using the Euclidean norm (tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag).
Finally a Fast Fourier Transform (FFT) was applied to some of these signals producing fBodyAcc-XYZ, fBodyAccJerk-XYZ, fBodyGyro-XYZ, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag. (Note the 'f' to indicate frequency domain signals).
These signals were used to estimate variables of the feature vector for each pattern:
'-XYZ' is used to denote 3-axial signals in the X, Y and Z directions.
Reference: features_info.txt file from A Public Domain Dataset for Human Activity Recognition Using Smartphones.
Finally, in the original data, all features were normalized to a range from -1 to 1, per the README.txt file from A Public Domain Dataset for Human Activity Recognition Using Smartphones. Therefore, each feature (measurement) varies from -1 to 1 across the 10,299 rows of the combined X_test.txt and X_train.txt files.
The following table describes all of the columns in the tidydata.txt file that was created to fulfill the requirements for the Getting and Cleaning Data course on Coursera offered during August 2015. Variable names in the data set are written using camelCase notation to facilitate ease of reading within R code.
Position | Column Name | Description |
---|---|---|
1 | personId | Numeric identifier (a unique sequential number) that indicates the participant or subject of the experiment. The original research study included 30 participants, so this variable has a range of numeric values from 1 - 30. No further information beyond an id number was provided by the original research team. |
2 | activityName | Character string describing one of six different activities that were performed by participants in the experiment, including:
|
3 | meanOfTimeBodyAccMeanX | Numeric variable measuring the mean of time domain body acceleration means in X dimension of the phone |
4 | meanOfTimeBodyAccMeanY | Numeric variable measuring the mean of time domain body acceleration means in Y dimension of the phone |
5 | meanOfTimeBodyAccMeanZ | Numeric variable measuring the mean of time domain body acceleration means in Z dimension of the phone |
6 | meanOfTimeGravityAccMeanX | Numeric variable measuring the mean of time domain gravity acceleration means in X dimension of the phone |
7 | meanOfTimeGravityAccMeanY | Numeric variable measuring the mean of time domain gravity acceleration means in Y dimension of the phone |
8 | meanOfTimeGravityAccMeanZ | Numeric variable measuring the mean of time domain gravity acceleration means in Z dimension of the phone |
9 | meanOfTimeBodyAccJerkMeanX | Numeric variable measuring the mean of time domain body acceleration jerk means in X dimension of the phone |
10 | meanOfTimeBodyAccJerkMeanY | Numeric variable measuring the mean of time domain body acceleration jerk means in Y dimension of the phone |
11 | meanOfTimeBodyAccJerkMeanZ | Numeric variable measuring the mean of time domain body acceleration jerk means in Z dimension of the phone |
12 | meanOfTimeBodyGyroMeanX | Numeric variable measuring the mean of time domain body gyroscope means in X dimension of the phone |
13 | meanOfTimeBodyGyroMeanY | Numeric variable measuring the mean of time domain body gyroscope means in Y dimension of the phone |
14 | meanOfTimeBodyGyroMeanZ | Numeric variable measuring the mean of time domain body gyroscope means in Z dimension of the phone |
15 | meanOfTimeBodyGyroJerkMeanX | Numeric variable measuring the mean of time domain body gyroscope jerk means in X dimension of the phone |
16 | meanOfTimeBodyGyroJerkMeanY | Numeric variable measuring the mean of time domain body gyroscope jerk means in Y dimension of the phone |
17 | meanOfTimeBodyGyroJerkMeanZ | Numeric variable measuring the mean of time domain body gyroscope jerk means in Z dimension of the phone |
18 | meanOfTimeBodyAccMagMean | Numeric variable measuring the mean of time domain body acceleration magnitude means |
19 | meanOfTimeGravityAccMagMean | Numeric variable measuring the mean of time domain gravity acceleration magnitude means |
20 | meanOfTimeBodyAccJerkMagMean | Numeric variable measuring the mean of time domain body acceleration jerk magnitude means |
21 | meanOfTimeBodyGyroMagMean | Numeric variable measuring the mean of time domain body gyroscope magnitude means |
22 | meanOfTimeBodyGyroJerkMagMean | Numeric variable measuring the mean of time domain body gyroscope jerk magnitude means |
23 | meanOfFreqBodyAccMeanX | Numeric variable measuring the mean of frequency domain body acceleration means in X dimension of the phone |
24 | meanOfFreqBodyAccMeanY | Numeric variable measuring the mean of frequency domain body acceleration means in Y dimension of the phone |
25 | meanOfFreqBodyAccMeanZ | Numeric variable measuring the mean of frequency domain body acceleration means in Z dimension of the phone |
26 | meanOfFreqBodyAccJerkMeanX | Numeric variable measuring the mean of frequency domain body acceleration jerk means in X dimension of the phone |
27 | meanOfFreqBodyAccJerkMeanY | Numeric variable measuring the mean of frequency domain body acceleration jerk means in Y dimension of the phone |
28 | meanOfFreqBodyAccJerkMeanZ | Numeric variable measuring the mean of frequency domain body acceleration jerk means in Z dimension of the phone |
29 | meanOfFreqBodyGyroMeanX | Numeric variable measuring the mean of frequency domain body gyroscope means in X dimension of the phone |
30 | meanOfFreqBodyGyroMeanY | Numeric variable measuring the mean of frequency domain body gyroscope means in Y dimension of the phone |
31 | meanOfFreqBodyGyroMeanZ | Numeric variable measuring the mean of frequency domain body gyroscope means in Z dimension of the phone |
32 | meanOfFreqBodyAccMagMean | Numeric variable measuring the mean of frequency domain body acceleration magnitude means |
33 | meanOfFreqBodyAccJerkMagMean | Numeric variable measuring the mean of frequency domain body acceleration jerk magnitude means |
34 | meanOfFreqBodyGyroMagMean | Numeric variable measuring the mean of frequency domain body gyroscope magnitude means |
35 | meanOfFreqBodyGyroJerkMagMean | Numeric variable measuring the mean of frequency domain body gyroscope jerk magnitude means |
36 | meanOfTimeBodyAccStdevX | Numeric variable measuring the mean of time domain body acceleration standard deviations in X dimension of the phone |
37 | meanOfTimeBodyAccStdevY | Numeric variable measuring the mean of time domain body acceleration standard deviations in Y dimension of the phone |
38 | meanOfTimeBodyAccStdevZ | Numeric variable measuring the mean of time domain body acceleration standard deviations in Z dimension of the phone |
39 | meanOfTimeGravityAccStdevX | Numeric variable measuring the mean of time domain gravity acceleration standard deviations in X dimension of the phone |
40 | meanOfTimeGravityAccStdevY | Numeric variable measuring the mean of time domain gravity acceleration standard deviations in Y dimension of the phone |
41 | meanOfTimeGravityAccStdevZ | Numeric variable measuring the mean of time domain gravity acceleration standard deviations in Z dimension of the phone |
42 | meanOfTimeBodyAccJerkStdevX | Numeric variable measuring the mean of time domain body acceleration jerk standard deviations in X dimension of the phone |
43 | meanOfTimeBodyAccJerkStdevY | Numeric variable measuring the mean of time domain body acceleration jerk standard deviations in Y dimension of the phone |
44 | meanOfTimeBodyAccJerkStdevZ | Numeric variable measuring the mean of time domain body acceleration jerk standard deviations in Z dimension of the phone |
45 | meanOfTimeBodyGyroStdevX | Numeric variable measuring the mean of time domain body gyroscope standard deviations in X dimension of the phone |
46 | meanOfTimeBodyGyroStdevY | Numeric variable measuring the mean of time domain body gyroscope standard deviations in Y dimension of the phone |
47 | meanOfTimeBodyGyroStdevZ | Numeric variable measuring the mean of time domain body gyroscope standard deviations in Z dimension of the phone |
48 | meanOfTimeBodyGyroJerkStdevX | Numeric variable measuring the mean of time domain body gyroscope jerk standard deviations in X dimension of the phone |
49 | meanOfTimeBodyGyroJerkStdevY | Numeric variable measuring the mean of time domain body gyroscope jerk standard deviations in Y dimension of the phone |
50 | meanOfTimeBodyGyroJerkStdevZ | Numeric variable measuring the mean of time domain body gyroscope jerk standard deviations in Z dimension of the phone |
51 | meanOfTimeBodyAccMagStdev | Numeric variable measuring the mean of time domain body acceleration magnitude standard deviations |
52 | meanOfTimeGravityAccMagStdev | Numeric variable measuring the mean of time domain gravity acceleration magnitude standard deviation |
53 | meanOfTimeBodyAccJerkMagStdev | Numeric variable measuring the mean of time domain body acceleration jerk magnitude standard deviation |
54 | meanOfTimeBodyGyroMagStdev | Numeric variable measuring the mean of time domain body gyroscope magnitude standard deviations |
55 | meanOfTimeBodyGyroJerkMagStdev | Numeric variable measuring the mean of time domain body gyroscope jerk magnitude standard deviations |
56 | meanOfFreqBodyAccStdevX | Numeric variable measuring the mean of frequency domain body acceleration standard deviations in X dimension of the phone |
57 | meanOfFreqBodyAccStdevY | Numeric variable measuring the mean of frequency domain body acceleration standard deviations in Y dimension of the phone |
58 | meanOfFreqBodyAccStdevZ | Numeric variable measuring the mean of frequency domain body acceleration standard deviations in Z dimension of the phone |
59 | meanOfFreqBodyAccJerkStdevX | Numeric variable measuring the mean of frequency domain body acceleration jerk standard deviations in X dimension of the phone |
60 | meanOfFreqBodyAccJerkStdevY | Numeric variable measuring the mean of frequency domain body acceleration jerk standard deviations in Y dimension of the phone |
61 | meanOfFreqBodyAccJerkStdevZ | Numeric variable measuring the mean of frequency domain body acceleration jerk standard deviations in Z dimension of the phone |
62 | meanOfFreqBodyGyroStdevX | Numeric variable measuring the mean of frequency domain body gyroscope standard deviations in X dimension of the phone |
63 | meanOfFreqBodyGyroStdevY | Numeric variable measuring the mean of frequency domain body gyroscope standard deviations in Y dimension of the phone |
64 | meanOfFreqBodyGyroStdevZ | Numeric variable measuring the mean of frequency domain body gyroscope standard deviations in Z dimension of the phone |
65 | meanOfFreqBodyAccMagStdev | Numeric variable measuring the mean of frequency domain body acceleration magnitude standard deviations |
66 | meanOfFreqBodyAccJerkMagStdev | Numeric variable measuring the mean of frequency domain body acceleration jerk magnitude standard deviations |
67 | meanOfFreqBodyGyroMagStdev | Numeric variable measuring the mean of frequency domain body gyroscope magnitude standard deviations |
68 | meanOfFreqBodyGyroJerkMagStdev | Numeric variable measuring the mean of frequency domain body gyroscope jerk magnitude standard deviations |