fix note 10 and 11

DS-100 · Oct 1, 2024 · 8561e62 · 8561e62
1 parent dd78345
commit 8561e62
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 25 deletions.
diff --git a/constant_model_loss_transformations/loss_transformations.qmd b/constant_model_loss_transformations/loss_transformations.qmd
@@ -90,7 +90,6 @@ Let's take a look at four different datasets.
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
@@ -102,7 +101,6 @@ from mpl_toolkits.mplot3d import Axes3D
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # Big font helper
 def adjust_fontsize(size=None):
     SMALL_SIZE = 8
@@ -155,7 +153,6 @@ plt.style.use("default")  # Revert style to default mpl
 ```
 
 ```{python}
-#| vscode: {languageId: python}
 plt.style.use("default")  # Revert style to default mpl
 NO_VIZ, RESID, RESID_SCATTER = range(3)
 
@@ -194,7 +191,6 @@ def least_squares_evaluation(x, y, visualize=NO_VIZ):
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # Load in four different datasets: I, II, III, IV
 x = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]
 y1 = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
@@ -231,7 +227,6 @@ While these four sets of datapoints look very different, they actually all have
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 for dataset in ["I", "II", "III", "IV"]:
     print(f">>> Dataset {dataset}:")
     ans = anscombe[dataset]
@@ -246,7 +241,6 @@ We may also wish to visualize the model's **residuals**, defined as the differen
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # Residual visualization
 fig, axs = plt.subplots(2, 2, figsize=(10, 10))
 
@@ -366,15 +360,13 @@ The code for generating the graphs and models is included below, but we won't go
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 dugongs = pd.read_csv("data/dugongs.csv")
 data_constant = dugongs["Age"]
 data_linear = dugongs[["Length", "Age"]]
 ```
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # Constant Model + MSE
 plt.style.use('default') # Revert style to default mpl
 adjust_fontsize(size=16)
@@ -400,7 +392,6 @@ plt.legend();
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # SLR + MSE
 def mse_linear(theta_0, theta_1, data_linear):
     data_x, data_y = data_linear.iloc[:, 0], data_linear.iloc[:, 1]
@@ -449,14 +440,13 @@ cbar.set_label("Cost Value")
 ax.set_title("MSE for different $\\theta_0, \\theta_1$")
 ax.set_xlabel("$\\theta_0$")
 ax.set_ylabel("$\\theta_1$")
-ax.set_zlabel("MSE")
+ax.set_zlabel("MSE");
 
 # plt.show()
 ```
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # Predictions
 yobs = data_linear["Age"]  # The true observations y
 xs = data_linear["Length"]  # Needed for linear predictions
@@ -468,7 +458,6 @@ yhats_linear = [theta_0_hat + theta_1_hat * x for x in xs]
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # Constant Model Rug Plot
 # In case we're in a weird style state
 sns.set_theme()
@@ -485,7 +474,6 @@ plt.yticks([]);
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # SLR model scatter plot 
 # In case we're in a weird style state
 sns.set_theme()
@@ -599,7 +587,6 @@ Let's consider a dataset where each entry represents the number of drinks sold a
 
 ```{python}
 #| code-fold: false
-#| vscode: {languageId: python}
 drinks = np.array([20, 21, 22, 29, 33])
 drinks
 ```
@@ -608,7 +595,6 @@ From our derivations above, we know that the optimal model parameter under MSE c
 
 ```{python}
 #| code-fold: false
-#| vscode: {languageId: python}
 np.mean(drinks), np.median(drinks)
 ```
 
@@ -622,7 +608,6 @@ How do outliers affect each cost function? Imagine we replace the largest value
 
 ```{python}
 #| code-fold: false
-#| vscode: {languageId: python}
 drinks_with_outlier = np.append(drinks, 1033)
 display(drinks_with_outlier)
 np.mean(drinks_with_outlier), np.median(drinks_with_outlier)
@@ -636,7 +621,6 @@ Let's try another experiment. This time, we'll add an additional, non-outlying d
 
 ```{python}
 #| code-fold: false
-#| vscode: {languageId: python}
 drinks_with_additional_observation = np.append(drinks, 35)
 drinks_with_additional_observation
 ```
@@ -680,7 +664,6 @@ Let's revisit our dugongs example. The lengths and ages are plotted below:
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 # `corrcoef` computes the correlation coefficient between two variables
 # `std` finds the standard deviation
 x = dugongs["Length"]
@@ -708,7 +691,6 @@ An important word on $\log$: in Data 100 (and most upper-division STEM courses),
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 z = np.log(y)
 
 r = np.corrcoef(x, z)[0, 1]
@@ -746,7 +728,6 @@ $y$ is an *exponential* function of $x$. Applying an exponential fit to the untr
 
 ```{python}
 #| code-fold: true
-#| vscode: {languageId: python}
 plt.figure(dpi=120, figsize=(4, 3))
 
 plt.scatter(x, y)
@@ -815,3 +796,4 @@ In the derivation above, we decompose the expected loss, $R(\theta)$, into two k
 - **Variance, $\sigma_y^2$**: This term represents the spread of the data points around their mean, $\bar{y}$, and is a measure of the data's inherent variability. Importantly, it does not depend on the choice of $\theta$, meaning it's a fixed property of the data. Variance serves as an indicator of the data's dispersion and is crucial in understanding the dataset's structure, but it remains constant regardless of how we adjust our model parameter $\theta$.
 
 - **Bias Squared, $(\bar{y} - \theta)^2$**: This term captures the bias of the estimator, defined as the square of the difference between the mean of the data points, $\bar{y}$, and the parameter $\theta$. The bias quantifies the systematic error introduced when estimating $\theta$. Minimizing this term is essential for improving the accuracy of the estimator. When $\theta = \bar{y}$, the bias is $0$, indicating that the estimator is unbiased for the parameter it estimates. This highlights a critical principle in statistical estimation: choosing $\theta$ to be the sample mean, $\bar{y}$, minimizes the average loss, rendering the estimator both efficient and unbiased for the population mean.
+
diff --git a/intro_to_modeling/intro_to_modeling.qmd b/intro_to_modeling/intro_to_modeling.qmd
@@ -96,7 +96,6 @@ The **regression line** is the unique straight line that minimizes the **mean sq
 - $\text{residual} =\text{observed }y - \text{regression estimate}$
 
 ```{python}
-#| vscode: {languageId: python}
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
@@ -105,11 +104,11 @@ import seaborn as sns
 np.random.seed(43)
 plt.style.use('default') 
 
-#Generate random noise for plotting
+# Generate random noise for plotting
 x = np.linspace(-3, 3, 100)
 y = x * 0.5 - 1 + np.random.randn(100) * 0.3
 
-#plot regression line
+# Plot regression line
 sns.regplot(x=x,y=y);
 ```
 
@@ -132,11 +131,10 @@ The correlation ($r$) is the average of the product of $x$ and $y$, both measure
 $$r = \frac{1}{n} \sum_{i=1}^n (\frac{x_i - \bar{x}}{\sigma_x})(\frac{y_i - \bar{y}}{\sigma_y})$$
 
 1. Correlation measures the strength of a **linear association** between two variables.
-2. Correlations range between -1 and 1: $|r| \leq 1$, with $r=1$ indicating perfect linear association, and $r=-1$ indicating perfect negative association. The closer $r$ is to $0$, the weaker the linear association is.
+2. Correlations range between -1 and 1: $|r| \leq 1$, with $r=1$ indicating perfect positive linear association, and $r=-1$ indicating perfect negative association. The closer $r$ is to $0$, the weaker the linear association is.
 3. Correlation says nothing about causation and non-linear association. Correlation does **not** imply causation. When $r = 0$, the two variables are uncorrelated. However, they could still be related through some non-linear relationship.
 
 ```{python}
-#| vscode: {languageId: python}
 def plot_and_get_corr(ax, x, y, title):
     ax.set_xlim(-3, 3)
     ax.set_ylim(-3, 3)