-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
519 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,276 @@ | ||
--- | ||
title: "Statistics in R with the `tidyverse`" | ||
author: "Dr. Chester Ismay" | ||
subtitle: "Day 2 Exercises" | ||
format: html | ||
--- | ||
|
||
```{r} | ||
#| include: false | ||
options(width = 120) | ||
``` | ||
|
||
|
||
# Day 2: Linear Regression in R | ||
|
||
## Session 1: Simple Linear Regression Analysis | ||
|
||
### 1. Load Necessary Libraries | ||
|
||
*We did this during the course in walkthroughs, but if you haven't done it yet:* | ||
|
||
```{r} | ||
# Load the required libraries | ||
library(dplyr) | ||
library(ggplot2) | ||
library(moderndive) | ||
library(readr) | ||
``` | ||
|
||
- These packages provide tools for data wrangling, visualization, and modeling. | ||
|
||
--- | ||
|
||
### 2. Prepare Dataset | ||
|
||
```{r} | ||
# Load in the data_dev_survey CSV file as a data frame again | ||
# selecting the columns of interest keeping only complete rows of information | ||
data_dev_survey <- read_csv("data_dev_survey.csv") |> | ||
select(response_id, converted_comp_yearly, work_exp, | ||
years_code, ai_trust) |> | ||
# This code sets the levels of the ai_trust column to be more intuitive | ||
# Note this will change the baseline level for the factor! | ||
mutate(ai_trust = factor(ai_trust, levels = c("Highly distrust", | ||
"Somewhat distrust", | ||
"Neither trust nor distrust", | ||
"Somewhat trust", | ||
"Highly trust"))) |> | ||
na.omit() | ||
``` | ||
|
||
- We're still exploring the `data_dev_survey` data from Day 1 here. | ||
|
||
--- | ||
|
||
### 3. Exploring Dataset Structure | ||
|
||
```{r} | ||
# Use glimpse to remind yourself of the structure of the data | ||
glimpse(data_dev_survey) | ||
``` | ||
|
||
--- | ||
|
||
### 4. Summary Statistics | ||
|
||
```{r} | ||
# Summarize mean and median values for converted_comp_yearly and work_exp | ||
data_dev_survey |> | ||
summarize(mean_comp = mean(converted_comp_yearly), | ||
median_comp = median(converted_comp_yearly), | ||
mean_exp = mean(work_exp), | ||
median_exp = median(work_exp)) | ||
``` | ||
|
||
--- | ||
|
||
### 5. Using `tidy_summary()` Function | ||
|
||
```{r} | ||
# Use tidy_summary to get a clean summary of the two columns | ||
data_dev_survey |> | ||
select(converted_comp_yearly, work_exp) |> | ||
tidy_summary() | ||
``` | ||
|
||
--- | ||
|
||
### 6. Correlation Between Salary and Work Experience | ||
|
||
```{r} | ||
# Compute the correlation between converted_comp_yearly and work_exp | ||
data_dev_survey |> | ||
get_correlation(formula = converted_comp_yearly ~ work_exp) | ||
``` | ||
|
||
--- | ||
|
||
### 7. Scatterplot of Salary and Work Experience | ||
|
||
```{r} | ||
# Plot the relationship between work experience and yearly compensation | ||
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly)) + | ||
geom_point(alpha = 0.2) + | ||
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)", | ||
title = "Scatterplot of Work Experience and Yearly Compensation") | ||
``` | ||
|
||
--- | ||
|
||
### 8. Regression Line | ||
|
||
```{r} | ||
# The same scatterplot with a regression line added | ||
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly)) + | ||
geom_point(alpha = 0.2) + | ||
geom_smooth(method = "lm", se = FALSE) + | ||
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)", | ||
title = "Work Experience vs. Yearly Compensation with Regression Line") | ||
``` | ||
|
||
--- | ||
|
||
### 9. Fit a Regression Model | ||
|
||
```{r} | ||
# Fit a linear regression model with compensation as the outcome | ||
salary_slr_model <- lm(converted_comp_yearly ~ work_exp, data = data_dev_survey) | ||
# Get the regression coefficients | ||
coef(salary_slr_model) | ||
``` | ||
|
||
--- | ||
|
||
## Session 2: Multiple Linear Regression Analysis (Part 1) | ||
|
||
### 10: Fitting a Simple Linear Regression with a Categorical Regressor | ||
|
||
```{r} | ||
# Fit a simple linear regression model with ai_trust as a predictor on salary | ||
trust_model <- lm(converted_comp_yearly ~ ai_trust, data = data_dev_survey) | ||
# Get regression coefficients | ||
coef(trust_model) | ||
``` | ||
|
||
--- | ||
|
||
### 11: Visualizing Data with Categorical Variable | ||
|
||
```{r} | ||
# Scatterplot with ai_trust as color for grouping on work_exp predicting | ||
# converted_comp_yearly | ||
ggplot(data_dev_survey, | ||
aes(x = work_exp, y = converted_comp_yearly, color = ai_trust)) + | ||
geom_point(alpha = 0.5) + | ||
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)", | ||
color = "Trust in AI", | ||
title = "Scatterplot of Work Experience and Yearly Compensation by Trust in AI") | ||
``` | ||
|
||
--- | ||
|
||
### 12: Summarizing the Target and Regressors (One numerical and one categorical) | ||
|
||
```{r} | ||
# Summarize the target and regressors | ||
data_dev_survey |> | ||
select(converted_comp_yearly, work_exp, ai_trust) |> | ||
tidy_summary() | ||
``` | ||
|
||
--- | ||
|
||
### 13: Fitting a Multiple Regression Model with Interaction Terms | ||
|
||
```{r} | ||
# Fit a multiple regression model with interaction | ||
multi_model_interaction <- lm(converted_comp_yearly ~ work_exp * ai_trust, | ||
data = data_dev_survey) | ||
# Get regression coefficients | ||
coef(multi_model_interaction) | ||
``` | ||
|
||
--- | ||
|
||
### 14: Adding Interaction Terms to the Regression Plot | ||
|
||
```{r} | ||
# Scatterplot with regression lines by trust in AI | ||
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly, color = ai_trust)) + | ||
geom_point(alpha = 0.5) + | ||
geom_smooth(method = "lm", se = FALSE) + | ||
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)", | ||
color = "Trust in AI", | ||
title = "Work Experience vs. Yearly Compensation by Trust in AI") | ||
``` | ||
|
||
--- | ||
|
||
## Session 3: Multiple Linear Regression Analysis (Part 2) | ||
|
||
### 15. Fitting a Multiple Regression Model Without Interactions | ||
|
||
```{r} | ||
# Fit a multiple regression model without interaction terms | ||
multi_model_no_interaction <- lm(converted_comp_yearly ~ work_exp + ai_trust, | ||
data = data_dev_survey) | ||
# Get regression coefficients | ||
coef(multi_model_no_interaction) | ||
``` | ||
|
||
--- | ||
|
||
### 16. Visualizing the Parallel Slopes Model | ||
|
||
```{r} | ||
# Scatterplot with regression lines by trust in AI (parallel slopes) | ||
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly, color = ai_trust)) + | ||
geom_point(alpha = 0.5) + | ||
geom_parallel_slopes(se = FALSE) + | ||
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)", | ||
color = "Trust in AI", | ||
title = "Work Experience vs. Yearly Compensation by Trust in AI") | ||
``` | ||
|
||
--- | ||
|
||
### 17. Summarize the Numerical Regressors and Target Variable | ||
|
||
```{r} | ||
# Use tidy_summary to get a clean summary of the data for a multiple regression | ||
# with work experience and years coding as regressors and yearly compensation as | ||
# the outcome | ||
data_dev_survey |> | ||
select(converted_comp_yearly, work_exp, years_code) |> | ||
tidy_summary() | ||
``` | ||
|
||
--- | ||
|
||
### 18. Fitting a Multiple Regression Model with Two Numerical Predictors | ||
|
||
```{r} | ||
# Fit a multiple regression model | ||
multi_model <- lm(converted_comp_yearly ~ work_exp + years_code, | ||
data = data_dev_survey) | ||
# Get regression coefficients | ||
coef(multi_model) | ||
``` | ||
|
||
--- | ||
|
||
### 19. Visualizing the Relationship between Variables (with SLR) | ||
|
||
```{r} | ||
# Create scatterplots with regression lines for both variables with | ||
# converted_comp_yearly as the response variable in simple linear regression | ||
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly)) + | ||
geom_point(alpha = 0.5) + | ||
geom_smooth(method = "lm", se = FALSE) + | ||
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)", | ||
title = "Yearly Compensation vs Work Experience") | ||
ggplot(data_dev_survey, aes(x = years_code, y = converted_comp_yearly)) + | ||
geom_point(alpha = 0.5) + | ||
geom_smooth(method = "lm", se = FALSE) + | ||
labs(x = "Years Coding", y = "Yearly Compensation (USD)", | ||
title = "Yearly Compensation vs Years Coding") | ||
``` | ||
|
Oops, something went wrong.