Switch to a standard Random class #51

HuwCampbell · 2018-01-29T07:55:17Z

At the moment the random initialisation is baked into the UpdateLayer class.

We could replace this with either Variates from System.Random.MWC or Random.

The text was updated successfully, but these errors were encountered:

schnecki · 2018-01-31T08:52:10Z

Btw shouldn't the weights be initialized by random numbers close to 0 and in the range [-1;1]? For me when I run randomNetwork in the IO Monad the weights are initialized by arbritary numbers, yielding huge outputs for e.g. FullyConnected. This causes issues with some activation layers like tanh.

I could fix it by adding with following snippet for the fully connected layer. But how is it supposed to be initialized?

...
s1   <- getRandomR (-1,1)
s2    <- getRandomR (-1,1)
let wB = 1/5000 * randomVector  s1 Uniform
    wN = 1/5000 * uniformSample s2 (-1) 1
...

HuwCampbell · 2018-01-31T10:27:17Z

They are actually (edit: within the [1,1] range).
It's currently

randomFullyConnected = do
    s1    <- getRandom
    s2    <- getRandom
    let wB = randomVector  s1 Uniform * 2 - 1
        wN = uniformSample s2 (-1) 1
        bm = konst 0
        mm = konst 0
return $ FullyConnected (FullyConnected' wB wN) (FullyConnected' bm mm)

The s1 and s2 are integer seeds which are fed to uniformSample from hmatrix. The (-1) 1 args specify the range. Uniform gives between 0, and 1, hence the scaling I have done.

HuwCampbell · 2018-01-31T10:30:11Z

But yes, they could be scaled down a bit to not be uniform, and there could also be some more normalisation done. This is one of the reasons I am interested in being a bit smarter about the initialisation.

schnecki · 2018-01-31T11:13:32Z

True, you're right. I investigated the observed problem a little more. It happens to me when I use multiple FullyConnected layers with Relu activations sequentially ending with a FullyConnected layer and the tanh activation. The wBs seem to lift the values out of the range [-1;1] and as ReLu ignores this, the input values to tanh are almost all >>1 or <<-1. This results in tanh returning 1 (-1) for most values. Learning from there fails (or at least would take forever).
edit: typo

schnecki · 2018-02-11T11:09:12Z

So, I'm quite busy for another 2-3 weeks. After that I could do the ticket if that's ok for everyone. But I will have to look into literature on how to properly initialize the weights first.

HuwCampbell · 2018-02-13T22:58:00Z

I think that should be fine.

schnecki · 2018-06-18T10:29:09Z

Hi @HuwCampbell
first of all I'd like to say sorry for taking that long. I have finally found time to work on the initialization.

You can find the current version here: https://github.com/schnecki/grenade

Before going into intialization: I implemented a GNum (Grenade Num) class which can be used to scale nodes (after initialization that might make sense) and add networks (e.g. for slow adaption in reinforcement learning where you use a training network and a target network), see Grenade.Core.Network. Additionally it can be used on Gradients to process batches in parallel (parMap) and then adding up the gradients before training the network (which was faster for me). Furthermore, I had to add NFData instances to prevent memory leaks.

Coming back to Weight Initialization, see Grenade.Core.WeightInitialization and Grenade.Core.Layer. All layers now ask the Grenade.Core.WeightInitialization module for a random vector/matrix when initializing. Therefore, the actual generation of random data is encapsulated in that module and thus adding a new weight initialization method just requires changes in that module. Btw a simple experiment showed that weight initialization makes a huge difference, see the feedforwardweightinit.hs example and test different settings.

My goal so far was to have backward compatibility, which worked out quite nicely. I moved the RandomMonad m constraint to specific IO which shouldn't be the problem for most people. Otherwise the new randomNetworkWith function can be called.

The class Variate as proposed by you can only provide uniformly distributed values, thus this does not work.

P.S.: Btw what training algorithm is implemented, Adam?

HuwCampbell added the easy label Jan 29, 2018

HuwCampbell changed the title ~~Switch to a normal Random class~~ Switch to a standard Random class Jan 29, 2018

schnecki mentioned this issue Jun 18, 2018

Weight Initialization, GNum class and NFData instances #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to a standard Random class #51

Switch to a standard Random class #51

HuwCampbell commented Jan 29, 2018

schnecki commented Jan 31, 2018 •

edited

Loading

HuwCampbell commented Jan 31, 2018 •

edited

Loading

HuwCampbell commented Jan 31, 2018

schnecki commented Jan 31, 2018 •

edited

Loading

schnecki commented Feb 11, 2018

HuwCampbell commented Feb 13, 2018

schnecki commented Jun 18, 2018 •

edited

Loading

Switch to a standard Random class #51

Switch to a standard Random class #51

Comments

HuwCampbell commented Jan 29, 2018

schnecki commented Jan 31, 2018 • edited Loading

HuwCampbell commented Jan 31, 2018 • edited Loading

HuwCampbell commented Jan 31, 2018

schnecki commented Jan 31, 2018 • edited Loading

schnecki commented Feb 11, 2018

HuwCampbell commented Feb 13, 2018

schnecki commented Jun 18, 2018 • edited Loading

schnecki commented Jan 31, 2018 •

edited

Loading

HuwCampbell commented Jan 31, 2018 •

edited

Loading

schnecki commented Jan 31, 2018 •

edited

Loading

schnecki commented Jun 18, 2018 •

edited

Loading