From 2ba406738442a3cbcc2994f0a1eeb6f9aecce333 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tesla=20Zhang=E2=80=AE?= Date: Tue, 3 Sep 2024 14:55:41 -0400 Subject: [PATCH 1/2] Change the date to a more persistent pointer --- hw0.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw0.ipynb b/hw0.ipynb index e15d948..211ec05 100644 --- a/hw0.ipynb +++ b/hw0.ipynb @@ -328,7 +328,7 @@ "source": [ "## Question 3: Softmax loss\n", "\n", - "Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `src/simple_ml.py`. Recall (hopefully this is review, but we'll also cover it in lecture on 9/1), that for a multi-class output that can take on values $y \\in \\{1,\\ldots,k\\}$, the softmax loss takes as input a vector of logits $z \\in \\mathbb{R}^k$, the true class $y \\in \\{1,\\ldots,k\\}$ returns a loss defined by\n", + "Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `src/simple_ml.py`. Recall (hopefully this is review, but we'll also cover it in the second lecture of weeek 1), that for a multi-class output that can take on values $y \\in \\{1,\\ldots,k\\}$, the softmax loss takes as input a vector of logits $z \\in \\mathbb{R}^k$, the true class $y \\in \\{1,\\ldots,k\\}$ returns a loss defined by\n", "\\begin{equation}\n", "\\ell_{\\mathrm{softmax}}(z, y) = \\log\\sum_{i=1}^k \\exp z_i - z_y.\n", "\\end{equation}\n", From ffbc3cbf8c313d7846196040f1e664cdd44d8bf0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tesla=20Zhang=E2=80=AE?= Date: Tue, 3 Sep 2024 20:35:32 -0400 Subject: [PATCH 2/2] There seems to be more of these dates --- hw0.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw0.ipynb b/hw0.ipynb index 211ec05..988a350 100644 --- a/hw0.ipynb +++ b/hw0.ipynb @@ -369,7 +369,7 @@ "source": [ "## Question 4: Stochastic gradient descent for softmax regression\n", "\n", - "In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression. In other words, as discussed in lecture on 9/1, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function\n", + "In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression. In other words, as discussed in lecture 2 in week 1, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function\n", "\\begin{equation}\n", "h(x) = \\Theta^T x\n", "\\end{equation}\n", @@ -494,7 +494,7 @@ "\\minimize_{W_1, W_2} \\;\\; \\ell_{\\mathrm{softmax}}(\\mathrm{ReLU}(X W_1) W_2, y).\n", "\\end{equation}\n", "\n", - "Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in class, on 9/8, but also provide the final form here for ease of implementation). Specifically, let\n", + "Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in the lecture 2 in week 2, but also provide the final form here for ease of implementation). Specifically, let\n", "\\begin{equation}\n", "\\begin{split}\n", "Z_1 \\in \\mathbb{R}^{m \\times d} & = \\mathrm{ReLU}(X W_1) \\\\\n",