-
Notifications
You must be signed in to change notification settings - Fork 0
Weekly report 5
This was a lousy week. The little time I had I spent debugging why my network still doesn't work, without finding out the reason.
Apparently the old example problem was not a good architecture (xor with relu as activation and mse as loss function), so I changed it to linear regression. I read some pytorch source code as I suspected that the problem might lie in my forward pass. It turned out I had implemented the forward pass a bit differently then they had, but after simulating the differences I still hadn't found what was wrong with my code. Their way certainly requires less transposing.
I also toyed with testing forward and backward pass with pytorch, but didn't finish it.
What do you know, the neural network seems to be working now. It can overfit to a simple linear regression problem. There is still a weird bug where sometimes it starts going to the wrong direction right from the beginning, and ends up spitting out nan
s and inf
s because the distance between the neural network's output and the desired output are too far apart, and so the derivative of the loss function ens up as infinity
. I think this is due to the weight initialization with np.random.normal
(from normal distribution). I have been advised to use the normal distribution, but just with np.random.random
the awkward nan
-error doesn't appear. However, with the normal distribution, once the network learns something, it does so with fewer epochs, and overfits into a more accurate result.
The problem was actually with both forward pass and backpropagation. Things were the right shape, but didn't do what they were supposed to. I transposed too much and tested too little. Managing to use pytorch for testing got me on the right track. I corrected the backpropagation notes, and wrote a bit about my bug there.
The next step is backpropagating biases and implementing cross entropy loss. If I have time I'll implement some of the matrix operations used.