diff --git a/hw0.ipynb b/hw0.ipynb index e15d948..988a350 100644 --- a/hw0.ipynb +++ b/hw0.ipynb @@ -328,7 +328,7 @@ "source": [ "## Question 3: Softmax loss\n", "\n", - "Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `src/simple_ml.py`. Recall (hopefully this is review, but we'll also cover it in lecture on 9/1), that for a multi-class output that can take on values $y \\in \\{1,\\ldots,k\\}$, the softmax loss takes as input a vector of logits $z \\in \\mathbb{R}^k$, the true class $y \\in \\{1,\\ldots,k\\}$ returns a loss defined by\n", + "Implement the softmax (a.k.a. cross-entropy) loss as defined in `softmax_loss()` function in `src/simple_ml.py`. Recall (hopefully this is review, but we'll also cover it in the second lecture of weeek 1), that for a multi-class output that can take on values $y \\in \\{1,\\ldots,k\\}$, the softmax loss takes as input a vector of logits $z \\in \\mathbb{R}^k$, the true class $y \\in \\{1,\\ldots,k\\}$ returns a loss defined by\n", "\\begin{equation}\n", "\\ell_{\\mathrm{softmax}}(z, y) = \\log\\sum_{i=1}^k \\exp z_i - z_y.\n", "\\end{equation}\n", @@ -369,7 +369,7 @@ "source": [ "## Question 4: Stochastic gradient descent for softmax regression\n", "\n", - "In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression. In other words, as discussed in lecture on 9/1, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function\n", + "In this question you will implement stochastic gradient descent (SGD) for (linear) softmax regression. In other words, as discussed in lecture 2 in week 1, we will consider a hypothesis function that makes $n$-dimensional inputs to $k$-dimensional logits via the function\n", "\\begin{equation}\n", "h(x) = \\Theta^T x\n", "\\end{equation}\n", @@ -494,7 +494,7 @@ "\\minimize_{W_1, W_2} \\;\\; \\ell_{\\mathrm{softmax}}(\\mathrm{ReLU}(X W_1) W_2, y).\n", "\\end{equation}\n", "\n", - "Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in class, on 9/8, but also provide the final form here for ease of implementation). Specifically, let\n", + "Using the chain rule, we can derive the backpropagation updates for this network (we'll briefly cover these in the lecture 2 in week 2, but also provide the final form here for ease of implementation). Specifically, let\n", "\\begin{equation}\n", "\\begin{split}\n", "Z_1 \\in \\mathbb{R}^{m \\times d} & = \\mathrm{ReLU}(X W_1) \\\\\n",