dlsyscourse · ice1000 · Sep 3, 2024 · Sep 4, 2024
diff --git a/hw0.ipynb b/hw0.ipynb
@@ -328,7 +328,7 @@
    "source": [
     "## Question 3: Softmax loss\n",
     "\n",
-    "Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `src/simple_ml.py`.  Recall (hopefully this is review, but we'll also cover it in lecture on 9/1), that for a multi-class output that can take on values $y \\in \\{1,\\ldots,k\\}$, the softmax loss takes as input a vector of logits $z \\in \\mathbb{R}^k$, the true class $y \\in \\{1,\\ldots,k\\}$ returns a loss defined by\n",
+    "Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `src/simple_ml.py`.  Recall (hopefully this is review, but we'll also cover it in the second lecture of weeek 1), that for a multi-class output that can take on values $y \\in \\{1,\\ldots,k\\}$, the softmax loss takes as input a vector of logits $z \\in \\mathbb{R}^k$, the true class $y \\in \\{1,\\ldots,k\\}$ returns a loss defined by\n",
     "\\begin{equation}\n",
     "\\ell_{\\mathrm{softmax}}(z, y) = \\log\\sum_{i=1}^k \\exp z_i - z_y.\n",
     "\\end{equation}\n",
@@ -369,7 +369,7 @@
    "source": [
     "## Question 4: Stochastic gradient descent for softmax regression\n",
     "\n",
-    "In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression.  In other words, as discussed in lecture on 9/1, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function\n",
+    "In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression.  In other words, as discussed in lecture 2 in week 1, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function\n",
     "\\begin{equation}\n",
     "h(x) = \\Theta^T x\n",
     "\\end{equation}\n",
@@ -494,7 +494,7 @@
     "\\minimize_{W_1, W_2} \\;\\; \\ell_{\\mathrm{softmax}}(\\mathrm{ReLU}(X W_1) W_2, y).\n",
     "\\end{equation}\n",
     "\n",
-    "Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in class, on 9/8, but also provide the final form here for ease of implementation).  Specifically, let\n",
+    "Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in the lecture 2 in week 2, but also provide the final form here for ease of implementation).  Specifically, let\n",
     "\\begin{equation}\n",
     "\\begin{split}\n",
     "Z_1 \\in \\mathbb{R}^{m \\times d} & = \\mathrm{ReLU}(X W_1) \\\\\n",