From 2785d775121aa41f673cfbfcb870f7dc9f7376d0 Mon Sep 17 00:00:00 2001 From: ishani07 Date: Mon, 29 Apr 2024 15:12:31 -0700 Subject: [PATCH] note 16 fix --- cv_regularization/cv_reg.qmd | 10 +++++----- docs/cv_regularization/cv_reg.html | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/cv_regularization/cv_reg.qmd b/cv_regularization/cv_reg.qmd index eb52cc3c..c7e03a40 100644 --- a/cv_regularization/cv_reg.qmd +++ b/cv_regularization/cv_reg.qmd @@ -279,8 +279,8 @@ $$\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 \phi_{i, 1} + \theta_2 \p Unfortunately, we can't directly use this formulation as our objective function – it's not easy to mathematically optimize over a constraint. Instead, we will apply the magic of the [Lagrangian Duality](https://en.wikipedia.org/wiki/Duality_(optimization)). The details of this are out of scope (take EECS 127 if you're interested in learning more), but the end result is very useful. It turns out that minimizing the following *augmented* objective function is *equivalent* to our minimization goal above. $$\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 \phi_{i, 1} + \theta_2 \phi_{i, 2} + \ldots + \theta_p \phi_{i, p}))^2 + \lambda \sum_{i=1}^p \vert \theta_i \vert$$ -$$ = ||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda \sum_{i=1}^p |\theta_i|$$ -$$ = ||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1$$ +$$ = \frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda \sum_{i=1}^p |\theta_i|$$ +$$ = \frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1$$ The last two expressions include the MSE expressed using vector notation, and the last expression writes $\sum_{i=1}^p |\theta_i|$ as it's **L1 norm** equivalent form, $|| \theta ||_1$. @@ -290,7 +290,7 @@ Notice that we've replaced the constraint with a second term in our objective fu 1. Keeping the model's error on the training data low, represented by the term $\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 x_{i, 1} + \theta_2 x_{i, 2} + \ldots + \theta_p x_{i, p}))^2$ 2. Keeping the magnitudes of model parameters low, represented by the term $\lambda \sum_{i=1}^p |\theta_i|$ -The $\lambda$ factor controls the degree of regularization. Roughly speaking, $\lambda$ is related to our $Q$ constraint from before by the rule $\lambda \approx \frac{1}{Q}$. To understand why, let's consider two extreme examples. Recall that our goal is to minimize the cost function: $||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1$. +The $\lambda$ factor controls the degree of regularization. Roughly speaking, $\lambda$ is related to our $Q$ constraint from before by the rule $\lambda \approx \frac{1}{Q}$. To understand why, let's consider two extreme examples. Recall that our goal is to minimize the cost function: $\frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1$. - Assume $\lambda \rightarrow \infty$. Then, $\lambda || \theta ||_1$ dominates the cost function. In order to neutralize the $\infty$ and minimize this term, we set $\theta_j = 0$ for all $j \ge 1$. This is a very constrained model that is mathematically equivalent to the constant model @@ -360,8 +360,8 @@ Notice that all we have done is change the constraint on the model parameters. T Using Lagrangian Duality (again, out of scope for Data 100), we can re-express our objective function as: $$\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 \phi_{i, 1} + \theta_2 \phi_{i, 2} + \ldots + \theta_p \phi_{i, p}))^2 + \lambda \sum_{i=1}^p \theta_i^2$$ -$$= ||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda \sum_{i=1}^p \theta_i^2$$ -$$= ||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_2^2$$ +$$= \frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda \sum_{i=1}^p \theta_i^2$$ +$$= \frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_2^2$$ The last two expressions include the MSE expressed using vector notation, and the last expression writes $\sum_{i=1}^p \theta_i^2$ as it's **L2 norm** equivalent form, $|| \theta ||_2^2$. diff --git a/docs/cv_regularization/cv_reg.html b/docs/cv_regularization/cv_reg.html index fd8c8de7..dbd6a44b 100644 --- a/docs/cv_regularization/cv_reg.html +++ b/docs/cv_regularization/cv_reg.html @@ -613,14 +613,14 @@

\[\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 \phi_{i, 1} + \theta_2 \phi_{i, 2} + \ldots + \theta_p \phi_{i, p}))^2\:\text{such that} \sum_{i=1}^p |\theta_i| \leq Q\]

Unfortunately, we can’t directly use this formulation as our objective function – it’s not easy to mathematically optimize over a constraint. Instead, we will apply the magic of the Lagrangian Duality. The details of this are out of scope (take EECS 127 if you’re interested in learning more), but the end result is very useful. It turns out that minimizing the following augmented objective function is equivalent to our minimization goal above.

-

\[\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 \phi_{i, 1} + \theta_2 \phi_{i, 2} + \ldots + \theta_p \phi_{i, p}))^2 + \lambda \sum_{i=1}^p \vert \theta_i \vert\] \[ = ||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda \sum_{i=1}^p |\theta_i|\] \[ = ||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1\]

+

\[\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 \phi_{i, 1} + \theta_2 \phi_{i, 2} + \ldots + \theta_p \phi_{i, p}))^2 + \lambda \sum_{i=1}^p \vert \theta_i \vert\] \[ = \frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda \sum_{i=1}^p |\theta_i|\] \[ = \frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1\]

The last two expressions include the MSE expressed using vector notation, and the last expression writes \(\sum_{i=1}^p |\theta_i|\) as it’s L1 norm equivalent form, \(|| \theta ||_1\).

Notice that we’ve replaced the constraint with a second term in our objective function. We’re now minimizing a function with an additional regularization term that penalizes large coefficients. In order to minimize this new objective function, we’ll end up balancing two components:

  1. Keeping the model’s error on the training data low, represented by the term \(\frac{1}{n} \sum_{i=1}^n (y_i - (\theta_0 + \theta_1 x_{i, 1} + \theta_2 x_{i, 2} + \ldots + \theta_p x_{i, p}))^2\)
  2. Keeping the magnitudes of model parameters low, represented by the term \(\lambda \sum_{i=1}^p |\theta_i|\)
-

The \(\lambda\) factor controls the degree of regularization. Roughly speaking, \(\lambda\) is related to our \(Q\) constraint from before by the rule \(\lambda \approx \frac{1}{Q}\). To understand why, let’s consider two extreme examples. Recall that our goal is to minimize the cost function: \(||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1\).

+

The \(\lambda\) factor controls the degree of regularization. Roughly speaking, \(\lambda\) is related to our \(Q\) constraint from before by the rule \(\lambda \approx \frac{1}{Q}\). To understand why, let’s consider two extreme examples. Recall that our goal is to minimize the cost function: \(\frac{1}{n}||\mathbb{Y} - \mathbb{X}\theta||_2^2 + \lambda || \theta ||_1\).