Skip to content

About forward pass

irenenikk edited this page Apr 27, 2018 · 1 revision

How a forward pass is implemented with matrices

Let's take the following neural network:

Simple neural network

We'll look specifically into how the neuron with a_4 (the first one from the top on the second layer) get's its activation calculated.

The input of a neuron is the sum of all the activations of neurons connected to it, multiplied by their respective weights. Since it's a fully connected layer, in this case the input for neuron a_4 contains the activations of all the neurons of the previous layer:

a_4 = w_1*a_1 + w_2*a_2 + w_3*a_3

You can see the substantive weights in red.

Rembember, in the most simple case the network receives a vector as an input. But since we need a lot of training data to achieve good results, it's important to be as time-efficient as possible. The training process can be optimized using batches, i.e. training several data points in one forward pass. In that case, the neural network receives a matrix as an input, with the rows representing the input vector for each batch.

So since we're dealing with sums and matrices, matrix multiplication is actually a very natural choice. By multiplying the input vector with the transpose of the weight vector, we're able to create a new vector, which has the exact correct sums for each neuron. Here's a demonstration on how the specific activation a_4 is formed in this case. Most weights and activations are omitted from the picture, since it would be non-readable otherwise:

Matrix multiplication example

Let's assume the batch size is 1. In this case, the network takes a matrix of shape (1, 3) as an input, since the input layer has 3 neurons. So now we have to multiply each neuron's activation with the correct weight to get the activation of the first neuron in the second layer. We start with a weight matrix of the shape (4, 3) since the next layer has 4 neurons, and this layer has 3. By transposing, we move the three weights we want to multiply, to the columns, and create a matrix of shape (3, 4). And voilà, now we can do a matrix multiplication, since their inner dimensions are the same. The resulting matrix will be of shape (1, 4), which we can see, is right, since it's (batch size, layer size), and we will be able to continue the multiplications with the next layer.

Clone this wiki locally