Skip to content

Commit

Permalink
Draft of Week 1
Browse files Browse the repository at this point in the history
  • Loading branch information
ismayc committed Oct 14, 2024
1 parent c05a036 commit cd185df
Show file tree
Hide file tree
Showing 4 changed files with 519 additions and 47 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/deploy-quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
quarto render day1_walkthrough_answers.qmd
quarto render day1_exercise_answers.qmd
quarto render day2_walkthrough_answers.qmd
# quarto render day2_exercise_answers.qmd
quarto render day2_exercise_answers.qmd
# Deploy HTML to GitHub Pages if on main or master branch
- name: Deploy to GitHub pages from main/master 🚀
Expand Down
276 changes: 276 additions & 0 deletions day2_exercise_answers.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,276 @@
---
title: "Statistics in R with the `tidyverse`"
author: "Dr. Chester Ismay"
subtitle: "Day 2 Exercises"
format: html
---

```{r}
#| include: false
options(width = 120)
```


# Day 2: Linear Regression in R

## Session 1: Simple Linear Regression Analysis

### 1. Load Necessary Libraries

*We did this during the course in walkthroughs, but if you haven't done it yet:*

```{r}
# Load the required libraries
library(dplyr)
library(ggplot2)
library(moderndive)
library(readr)
```

- These packages provide tools for data wrangling, visualization, and modeling.

---

### 2. Prepare Dataset

```{r}
# Load in the data_dev_survey CSV file as a data frame again
# selecting the columns of interest keeping only complete rows of information
data_dev_survey <- read_csv("data_dev_survey.csv") |>
select(response_id, converted_comp_yearly, work_exp,
years_code, ai_trust) |>
# This code sets the levels of the ai_trust column to be more intuitive
# Note this will change the baseline level for the factor!
mutate(ai_trust = factor(ai_trust, levels = c("Highly distrust",
"Somewhat distrust",
"Neither trust nor distrust",
"Somewhat trust",
"Highly trust"))) |>
na.omit()
```

- We're still exploring the `data_dev_survey` data from Day 1 here.

---

### 3. Exploring Dataset Structure

```{r}
# Use glimpse to remind yourself of the structure of the data
glimpse(data_dev_survey)
```

---

### 4. Summary Statistics

```{r}
# Summarize mean and median values for converted_comp_yearly and work_exp
data_dev_survey |>
summarize(mean_comp = mean(converted_comp_yearly),
median_comp = median(converted_comp_yearly),
mean_exp = mean(work_exp),
median_exp = median(work_exp))
```

---

### 5. Using `tidy_summary()` Function

```{r}
# Use tidy_summary to get a clean summary of the two columns
data_dev_survey |>
select(converted_comp_yearly, work_exp) |>
tidy_summary()
```

---

### 6. Correlation Between Salary and Work Experience

```{r}
# Compute the correlation between converted_comp_yearly and work_exp
data_dev_survey |>
get_correlation(formula = converted_comp_yearly ~ work_exp)
```

---

### 7. Scatterplot of Salary and Work Experience

```{r}
# Plot the relationship between work experience and yearly compensation
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly)) +
geom_point(alpha = 0.2) +
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)",
title = "Scatterplot of Work Experience and Yearly Compensation")
```

---

### 8. Regression Line

```{r}
# The same scatterplot with a regression line added
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly)) +
geom_point(alpha = 0.2) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)",
title = "Work Experience vs. Yearly Compensation with Regression Line")
```

---

### 9. Fit a Regression Model

```{r}
# Fit a linear regression model with compensation as the outcome
salary_slr_model <- lm(converted_comp_yearly ~ work_exp, data = data_dev_survey)
# Get the regression coefficients
coef(salary_slr_model)
```

---

## Session 2: Multiple Linear Regression Analysis (Part 1)

### 10: Fitting a Simple Linear Regression with a Categorical Regressor

```{r}
# Fit a simple linear regression model with ai_trust as a predictor on salary
trust_model <- lm(converted_comp_yearly ~ ai_trust, data = data_dev_survey)
# Get regression coefficients
coef(trust_model)
```

---

### 11: Visualizing Data with Categorical Variable

```{r}
# Scatterplot with ai_trust as color for grouping on work_exp predicting
# converted_comp_yearly
ggplot(data_dev_survey,
aes(x = work_exp, y = converted_comp_yearly, color = ai_trust)) +
geom_point(alpha = 0.5) +
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)",
color = "Trust in AI",
title = "Scatterplot of Work Experience and Yearly Compensation by Trust in AI")
```

---

### 12: Summarizing the Target and Regressors (One numerical and one categorical)

```{r}
# Summarize the target and regressors
data_dev_survey |>
select(converted_comp_yearly, work_exp, ai_trust) |>
tidy_summary()
```

---

### 13: Fitting a Multiple Regression Model with Interaction Terms

```{r}
# Fit a multiple regression model with interaction
multi_model_interaction <- lm(converted_comp_yearly ~ work_exp * ai_trust,
data = data_dev_survey)
# Get regression coefficients
coef(multi_model_interaction)
```

---

### 14: Adding Interaction Terms to the Regression Plot

```{r}
# Scatterplot with regression lines by trust in AI
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly, color = ai_trust)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)",
color = "Trust in AI",
title = "Work Experience vs. Yearly Compensation by Trust in AI")
```

---

## Session 3: Multiple Linear Regression Analysis (Part 2)

### 15. Fitting a Multiple Regression Model Without Interactions

```{r}
# Fit a multiple regression model without interaction terms
multi_model_no_interaction <- lm(converted_comp_yearly ~ work_exp + ai_trust,
data = data_dev_survey)
# Get regression coefficients
coef(multi_model_no_interaction)
```

---

### 16. Visualizing the Parallel Slopes Model

```{r}
# Scatterplot with regression lines by trust in AI (parallel slopes)
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly, color = ai_trust)) +
geom_point(alpha = 0.5) +
geom_parallel_slopes(se = FALSE) +
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)",
color = "Trust in AI",
title = "Work Experience vs. Yearly Compensation by Trust in AI")
```

---

### 17. Summarize the Numerical Regressors and Target Variable

```{r}
# Use tidy_summary to get a clean summary of the data for a multiple regression
# with work experience and years coding as regressors and yearly compensation as
# the outcome
data_dev_survey |>
select(converted_comp_yearly, work_exp, years_code) |>
tidy_summary()
```

---

### 18. Fitting a Multiple Regression Model with Two Numerical Predictors

```{r}
# Fit a multiple regression model
multi_model <- lm(converted_comp_yearly ~ work_exp + years_code,
data = data_dev_survey)
# Get regression coefficients
coef(multi_model)
```

---

### 19. Visualizing the Relationship between Variables (with SLR)

```{r}
# Create scatterplots with regression lines for both variables with
# converted_comp_yearly as the response variable in simple linear regression
ggplot(data_dev_survey, aes(x = work_exp, y = converted_comp_yearly)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Work Experience (years)", y = "Yearly Compensation (USD)",
title = "Yearly Compensation vs Work Experience")
ggplot(data_dev_survey, aes(x = years_code, y = converted_comp_yearly)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Years Coding", y = "Yearly Compensation (USD)",
title = "Yearly Compensation vs Years Coding")
```

Loading

0 comments on commit cd185df

Please sign in to comment.