Multidimensional learning #89

tcstewar · 2018-09-26T03:25:34Z

This PR fixes multidimensional learning. It is based off of move-manual-decoders-onchip, but only because that had a nice test that fit for this PR.

All this does is fix the snip code and the communication code such that all the error data gets sent across and applied. It doesn't change the emulator at all, since it was already working in the emulator.

The communication protocol is not wonderful. The message format is [core_id, n_vals, val_0, val_1, ... val_n-1], so the first two values are the same every time a message is sent (this isn't much worse than the old format, which was [core_id, val_0]). Optimizing that will be a different PR, and will interact in interesting ways with #26 . But this is a definite improvement in the meantime.

This is heavily based on, but replaces #22 (the rebasing got too complicated for me so I started this one fresh).

TODO

get multiple learning cores working

tbekolay · 2018-09-26T19:00:00Z

@tcstewar You removed your assignment, but the PR is still marked as WIP. Is it ready for review? Or collaboration?

tbekolay · 2018-09-26T19:01:08Z

Another thing TODO:

Go through the review comments in Multidimensional learning #22 and ensure they're addressed here.

tcstewar · 2018-09-26T19:58:08Z

Is it ready for review? Or collaboration?

I'll rebase it to master now that the manual decoders are merged in, then take a pass through the #22 comments, and then it should be ready for review. :)

tcstewar · 2018-09-27T13:50:20Z

The last commit adds a test of having multiple learning connections at once. It seems to mostly work, but doesn't converge to the target values and I'm not sure why. If someone else could take a look at it before the workshop, that'd be great!

tbekolay · 2018-09-28T02:01:55Z

So adding some plotting to that test, I get the following in the emulator

test_learning.test_multiple_pes_emu.pdf

and this on hardware

test_learning.test_multiple_pes.pdf

One difference is that they change at different rates. That is likely down to a mismatch in learning rates that we probably need to do a better job of matching. It may also be partly due to the fact that error values are clipped to the [-1, 1] range.

The more salient difference is that after a steady, predictable climb toward the target value, the line starts going haywire. This becomes more apparent when you run the network for longer, or with a high learning rate. The fact that the lines all start moving in a predictable manner tells me that it is not likely a problem with the PES implementation (i.e., the delta is being calculated correctly and is being applied to the right synapses, etc etc). I think the problem is most likely some value over or underflowing its discretized range; this has happened before (and is why we introduced the checks discussed in #88). I do not get any warnings raised in the emulator, so the over/underflowed quantity is not U or V; most likely it's the weights themselves. In the emulator, we currently use float64s to represent the learning trace and while we initially discretize the weights to match hardware, we don't re-discretize every time the weights change during the simulation, so likely we're using the full int32 range. Put another way, the problem is that we discretize weights based on their initial value, but when doing online learning, their final values are going to be very different from their initial values.

In the long term, we should definitely think about ways in which we can mitigate this problem. I'll make an issue to start thinking about that.

In the short term, I'm thinking if that the initial function that the learned connection represents results in larger decoders, then we should do a much better job of discretizing. I'll try doing that now.

tbekolay · 2018-09-28T02:11:37Z

Modifying the function (actually in this case just removing the function=lambda x: 0) is a success!

Emu:
test_learning.test_multiple_pes_emu.pdf

Hardware:
test_learning.test_multiple_pes.pdf

Also note that they both converge significantly faster. I suspect this might happen on reference Nengo too because the weights might have less distance to travel because they don't start near zero. But, it may also be the case that the range discretization results in the same weight updates having a larger effect than they would have in reference Nengo because weights are being pushed farther as a result of the same delta.

In any case, I'll add some asserts and push the now working test. I'll also make a separate issue to figure out the best way to inform users about this issue.

tbekolay · 2018-09-28T02:55:47Z

OK, pushed 6eab5e3, and also rebased. If you get a chance, let me know what you think @tcstewar, but in any case I'll do a proper review tomorrow so we can get this merged in.

tcstewar · 2018-09-28T08:45:14Z

OK, pushed 6eab5e3, and also rebased. If you get a chance, let me know what you think @tcstewar, but in any case I'll do a proper review tomorrow so we can get this merged in.

Awesome! I completely didn't think of that weight scaling issue.... makes a lot of sense in hindsight, but I was not thinking in that direction at all..... Very good thing to know for the learning tutorials too. :)

The commit looks good to me. :) Thank you!

hunse · 2018-09-28T13:04:17Z

It would be nice if we could replicate that chip learning behaviour in the emulator, before we got rid of function=lambda x: 0. I assume it's some kind of clipping that's going on. Doesn't have to be for this PR, but if not now then let's make an issue.

tbekolay · 2018-09-28T13:18:12Z

Yeah I'll make an issue.

tbekolay

Pushed a commit with mostly style fixes. With that, LGTM!

One thing that I tried was to switch the initial function in test_pes_comm_channel from returning 0 to returning -x, but when I did that all the parametrizations failed with U overflow errors. So, I think the initial function is a super critical thing to play around with in learning networks.

I'll make a few issues out of the things discovered in this PR, then squash all the commits down to one and merge. Will probably take a little while, so if anyone has objections feel free to raise them in the next hour or so!

tbekolay · 2018-09-28T16:28:30Z

Added comments to #83 and #98 and made #99 and #100 to track things that we should do as a result of the issues raised in this PR. Squashing and merging now!

The actual communication protocal is not as efficient as it could be, but this work properly. The choice of the function computed across the learned connection prior to learning turns out to have a huge effect on the behavior of the network. If the weights are initially much smaller than they will be post-learning, the weights can easily over/underflow. Improving the weight discretization, however, is left for future work. This commit also removes some writes that used to be needed to close the snip successfully, but they appear to be unnecessary now.

drasmuss assigned tcstewar Sep 26, 2018

drasmuss added the work in progress label Sep 26, 2018

tcstewar removed their assignment Sep 26, 2018

tbekolay mentioned this pull request Sep 26, 2018

Multidimensional learning #22

Closed

tbekolay force-pushed the move-manual-decoders-onchip branch 2 times, most recently from 52b9c46 to 0811e2a Compare September 26, 2018 19:24

tbekolay changed the base branch from move-manual-decoders-onchip to master September 26, 2018 19:32

tcstewar self-assigned this Sep 26, 2018

tcstewar force-pushed the multi-learn branch from bcb2a3d to c1b46f0 Compare September 26, 2018 20:04

tcstewar added ready for review and removed work in progress labels Sep 27, 2018

tcstewar removed their assignment Sep 27, 2018

tcstewar added the ready for collaboration label Sep 27, 2018

drasmuss removed the ready for review label Sep 27, 2018

tbekolay force-pushed the multi-learn branch from 3f7bbd4 to 6eab5e3 Compare September 28, 2018 02:55

hunse mentioned this pull request Sep 28, 2018

Clip weights during learning to emulate chip #98

Closed

tbekolay removed the ready for collaboration label Sep 28, 2018

tbekolay approved these changes Sep 28, 2018

View reviewed changes

This was referenced Sep 28, 2018

Discretize to use as much of the dynamic range as possible #83

Open

Teach users about initial PES functions #99

Closed

tbekolay mentioned this pull request Sep 28, 2018

Make learning snip more efficient #100

Open

tbekolay force-pushed the multi-learn branch from 3c5b7e5 to 19746b1 Compare September 28, 2018 16:37

tbekolay merged commit 19746b1 into master Sep 28, 2018

tbekolay deleted the multi-learn branch September 28, 2018 17:27

hunse mentioned this pull request Nov 6, 2018

Learning matches Nengo learning more closely #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multidimensional learning #89

Multidimensional learning #89

tcstewar commented Sep 26, 2018 •

edited

Loading

tbekolay commented Sep 26, 2018

tbekolay commented Sep 26, 2018 •

edited

Loading

tcstewar commented Sep 26, 2018

tcstewar commented Sep 27, 2018

tbekolay commented Sep 28, 2018

tbekolay commented Sep 28, 2018

tbekolay commented Sep 28, 2018

tcstewar commented Sep 28, 2018

hunse commented Sep 28, 2018 •

edited

Loading

tbekolay commented Sep 28, 2018

tbekolay left a comment

tbekolay commented Sep 28, 2018 •

edited

Loading

Multidimensional learning #89

Multidimensional learning #89

Conversation

tcstewar commented Sep 26, 2018 • edited Loading

tbekolay commented Sep 26, 2018

tbekolay commented Sep 26, 2018 • edited Loading

tcstewar commented Sep 26, 2018

tcstewar commented Sep 27, 2018

tbekolay commented Sep 28, 2018

tbekolay commented Sep 28, 2018

tbekolay commented Sep 28, 2018

tcstewar commented Sep 28, 2018

hunse commented Sep 28, 2018 • edited Loading

tbekolay commented Sep 28, 2018

tbekolay left a comment

Choose a reason for hiding this comment

tbekolay commented Sep 28, 2018 • edited Loading

tcstewar commented Sep 26, 2018 •

edited

Loading

tbekolay commented Sep 26, 2018 •

edited

Loading

hunse commented Sep 28, 2018 •

edited

Loading

tbekolay commented Sep 28, 2018 •

edited

Loading