-
Notifications
You must be signed in to change notification settings - Fork 0
About forward pass
Let's take the following neural network:
We'll look specifically into how the neuron with a_4
(the first one from the top on the second layer) get's its activation calculated.
The input of a neuron is the sum of all the activations of neurons connected to it, multiplied by their respective weights. Since it's a fully connected layer, in this case the input for neuron a_4
contains the activations of all the neurons of the previous layer:
a_4 = w_1*a_1 + w_2*a_2 + w_3*a_3
You can see the substantive weights in red.
Rembember, in the most simple case the network receives a vector as an input. But since we need a lot of training data to achieve good results, it's important to be as time-efficient as possible. The training process can be optimized using batches, i.e. training several data points in one forward pass. In that case, the neural network receives a matrix as an input, with the rows representing the input vector for each batch.
So since we're dealing with sums and matrices, matrix multiplication is actually a very natural choice. By multiplying the input vector with the transpose of the weight vector, we're able to create a new vector, which has the exact correct sums for each neuron. Here's a demonstration on how the specific activation a_4
is formed in this case. Most weights and activations are omitted from the picture, since it would be non-readable otherwise:
Let's assume the batch size is 1. In this case, the network takes a matrix of shape (1, 3)
as an input, since the input layer has 3 neurons. So now we have to multiply each neuron's activation with the correct weight to get the activation of the first neuron in the second layer. We start with a weight matrix of the shape (4, 3)
since the next layer has 4 neurons, and this layer has 3. By transposing, we move the three weights we want to multiply, to the columns, and create a matrix of shape (3, 4)
. And voilà, now we can do a matrix multiplication, since their inner dimensions are the same. The resulting matrix will be of shape (1, 4)
, which we can see, is right, since it's (batch size, layer size)
, and we will be able to continue the multiplications with the next layer.