Skip to content

Latest commit

 

History

History
70 lines (41 loc) · 2.64 KB

backpropagation.rst

File metadata and controls

70 lines (41 loc) · 2.64 KB

Backpropagation

When performing gradient descent, backpropagation is used to determine how you change the parameters to move towards the optimal point. Backpropagation describes the process of finding derivatives of the cost function using the chain rule until you reach the derivatives of w and b. The slope of these parameters determines which direction they are incremented.

../_img/sigmoidPic.jpg

As discussed in the Linear Regression tutorial, the sigmoid function is a = σ(z), where z = wTx+ b. The following describes the derivation of this function:

../_img/backpropagation1.JPG

Here we can see that da/dz = a (1 – a).

Consider a neural network with one hidden layer. The parameters will be the matrices W[1], b[1], W[2], and b[2]. The cost function will be the following equation:

../_img/backpropagation2.JPG

In this equation, ŷ is interchangeable with a[2]because it is our last a value. The values you are trying to find through backpropagation here are dw[1]and db[1]so that you can update w and b accordingly for gradient descent. First you need to compute ŷ for the training examples through the following steps which are discussed in the previous section:

z[1] = W[1]X + b[1]

A[1] = σ(z[1])

z[2] = W[2]A[1]+ b[2]

A[2] = σ(z[2]) which is the set of every training example’s ŷ value

The following equations are used to solve the backpropagation in the above example:

dz[2] = A[2]– Y where Y is the set of the y value of each training example

dw[2] = (1/m) dz[2]A[1]T

db[2] = (1/m) Σ(dz[2]) matrix summation

dz[1] = W[2]Tdz[2]* g’(Z[1]) where g(Z[1]) = σ(Z[1]) and g’ is the derivative of g

dw[1] = (1/m) dz[1]XT

db[1] = (1/m) Σ(dz[1]) matrix summation

Now the you have taken the cost function and computed dw[1]and db[1]from it through back propagation, you are able update them and continue with gradient descent.

https://www.coursera.org/