Speeding up snips by communicating less often #26

tcstewar · 2018-08-24T03:37:27Z

This changes the snip I/O system so that we don't communicate every timestep. This adds a snip_io_steps parameter to Simulator to let you control how often (in units of steps) we should communicate. The reason for this is to drastically speed up the simulation! If this is set to 1 (which is the same as what it used to be), then we're at around 8ms per time step. If it's set to 10 (which is the new default I just created), then we get ~1.5ms per time step! Yay faster loihi!
One big change that was used to implement this is that we are no longer sending spikes from host to chip. Instead, we send decoded values, and those values are then turned into spikes on-chip. Right now, this is done in the snip itself on the x86 processor, just by implementing a bunch of little accumulators. However, in a future PR we could also reorganize this so that we actually use a Loihi core to do this, since simulating spiking neurons is what it's for. In that case, whenever we receive a value from the host, we'd set the bias of a Loihi Cx. This would mean that we wouldn't be doing anything in the snip itself on time steps where we're not communicating.

TODO

Check if there are too many inputs (userdata is only 1024 bytes)

The overall structure of the documentation is copied from other Nengo projects. Much of the content is adapted from the docs currently in the wiki, with slight changes to make it more generic to any Loihi setup and not necessarily ours. The overview and API docs are new. Some classes that made sense to include in the API docs had sparse docstrings, so those were added where appropriate. Examples were moved to the docs folder for easier inclusion and fewer folders in the root directory.

The input stream has shape `(n_steps, input_dimensions)`, so before we were always running for `input_dimensions` (390) timesteps.

And add a docstring for loihi_api.VthProfile.

Co-authored-by: Eric Hunsberger <[email protected]>

For both the emulator and hardware. Note that these are set to match for the learn_communication_channel notebook, and won't necessarily match in other places, but since we render this notebook in the docs it's important that it matches Nengo here.

hunse

Technically, this looks good. I've just made a few comments asking for comments/explanation at certain points in the code. I've also added a few comments myself in additional commits. (Be sure to pop out the "show outdated" comments. I went through commit-by-commit, which is why they've appeared like that, but I think they're all still applicable.)

hunse · 2018-09-05T13:23:34Z

nengo_loihi/loihi_interface.py

+                for inp, cx_ids in core.iterate_inputs():
+                    axon_ids = inp.axon_ids[0]
+                    assert len(axon_ids) % 2 == 0
+                    n_inputs += len(axon_ids) //2


Why divide by two?

Is it because we have both positive and negative channels? We should note that here.

Yes, and good idea -- I've added a comment

hunse · 2018-09-05T13:28:19Z

nengo_loihi/simulator.py

+                axon_ids = inp.axon_ids[0]
+                half = len(axon_ids)//2
+                for i in range(len(axon_ids)//2):
+                    assert axon_ids[i][0] == 0


This is the chip ID, right? Maybe put a "chip_id must be zero" message with the assertion.

Comment added :)

hunse · 2018-09-05T13:30:17Z

nengo_loihi/simulator.py

+                    assert axon_ids[i][0] == 0
+                    assert axon_ids[i][1] == axon_ids[half+i][1]
+                    spike_targets.extend((axon_ids[i][1], axon_ids[i][2],
+                                          axon_ids[half+i][2]))


A comment to explain what each of these values are would be helpful.

Very good idea. That's pretty obscure what it's doing... comment added.

hunse · 2018-09-05T13:30:56Z

nengo_loihi/snips/nengo_io.c.template

+            readChannel(inChannel, spike, 3);
+            s->userData[IDX_SPIKE_TARGETS+(i*3)+0] = spike[0];
+            s->userData[IDX_SPIKE_TARGETS+(i*3)+1] = spike[1];
+            s->userData[IDX_SPIKE_TARGETS+(i*3)+2] = spike[2];


Again, I'd have comments at the end of each line saying what that value is.

hunse · 2018-09-05T13:32:59Z

nengo_loihi/snips/nengo_io.c.template

+    for (int i=0; i < N_INPUTS; i++) {
+        readChannel(inChannel, spike, 1);
+        //printf("stim value %d.%d\n", i, spike[0]);
+        value[i] = spike[0];


Is this just a binary value indicating whether there's a spike in that channel at the current time?

Looking at the code below, I now believe it to be a discretized version of the real value that we're communicating (i.e. the decoded value). A note about this would be helpful.

hunse · 2018-09-05T13:40:30Z

nengo_loihi/splitter.py

-                        encoders=np.vstack([np.eye(dim), -np.eye(dim)]),
-                        max_rates=[max_rate] * dim + [max_rate] * dim,
-                        intercepts=[-1] * dim + [-1] * dim)
+                    # TODO: move the above check elsewhere


I know I wrote this check, but why can't the collective rate be > 1000?

Ooh, this looks weird now since you can't actually see the check (unless you go commit-by-commit). The check was that max_rate <= 1000, where max_rate = inter_rate * inter_n.

Yeah, I just kept that check from before, and it doesn't seem right to me. It also seems like something that should be looked into in #55 . I've added a TODO here and made a note about it in that PR

hunse · 2018-09-05T13:50:53Z

nengo_loihi/snips/nengo_io.c.template

-        readChannel(inChannel, spike, 1);
-        //printf("stim value %d.%d\n", i, spike[0]);
-        value[i] = spike[0];
+    if (s->time % IO_STEPS == 0) {


Would it be better for this to happen when s->time % IO_STEPS == 1? I believe that time starts at 1. That way, we'll read a value right after initialization, and won't waste the first few time steps with no input.

Good point -- I've moved the check. :)

hunse · 2018-09-05T13:56:39Z

nengo_loihi/snips/nengo_io.c.template

 {% endfor %}

-    writeChannel(outChannel, output, N_OUTPUTS);
+        writeChannel(outChannel, output, N_OUTPUTS);


Is N_OUTPUTS sometimes zero? If so, is there any overhead associated with this? Do we want a check so that we only call this if we have outputs?

I can't think of a use case for N_OUTPUTS==0, but it is supported, so I've added an if statement. :)

hunse · 2018-09-05T13:59:58Z

nengo_loihi/simulator.py

+                import nxsdk
+                target = 'loihi'
+            except ImportError:
+                target = 'sim'


Not related to what's being done here, but it might be nice to have some sort of output (print statements? logger statements?) that indicate which target has been chosen. I would probably do it even in the case that the user has explicitly specified target (so after this if block).

Possibly, although I'm not sure what the best sort of message would be. I'd definitely leave it for a separate PR, though.

Pretty sure we already have these logging statements, they're just not raised here.

This is to avoid surprising users who expect the default nengo behaviour of messages passing all the time

tcstewar · 2018-09-10T16:07:39Z

Thank you for the comments, @hunse ! I've added a bunch of comments based on your suggestions.

I also made one change: I changed the default for snip_io_skips to 1, so that it defaults to exactly the same behaviour that we are used to, and users don't have to understand anything. This means that, in order to get any speedup for their snips, they have to manually set this parameter. I think this is less surprising than defaulting to 10 and having people be surprised that their inputs change at 10ms intervals.

hunse · 2018-09-10T16:10:25Z

nengo_loihi/snips/nengo_io.c.template

+    // Note that we do this at the *start* of an IO_STEP period (since
+    //  s->time starts at 1), so that we don't have a period at the beginning
+    //  of the simulation where we are ignoring the input.
+    if (s->time % IO_STEPS == 1 % IO_STEPS) {


Is this the same as (IO_STEPS==1) || (s->time % IO_STEPS == 1)? Currently we do it one way here and another way down below. I'd keep it consistent. Also, as elegant as this way is, I think I prefer the explicit OR.

Ah, I didn't even notice I'd solved that puzzle before.... :) Yes, those are meant to be the same, and I agree that the explicit OR is clearer than the elegant mod-comparison thing. :) I've switched it to the explicit approach.

hunse · 2018-09-10T16:13:15Z

It all looks good, except for the one little inline comment I added.

I also think defaulting the number of steps to 1 is a good idea.

tbekolay · 2018-09-11T02:29:34Z

nengo_loihi/simulator.py

-                target = 'loihi'
-            except ImportError:
-                target = 'sim'
-
        if target == 'simreal':
            logger.info("Using real-valued simulator")


tbekolay

I've looked through briefly, and I have a few issues before I delve in in more detail.

I'm not sure about the overhead of it, but it feels like collecting timing information (tracking self.time_per_step) shouldn't always be enabled, but be something you can turn on and off.
spiking_interneurons_on_host, on the other hand, feels like a weird thing to be able to turn on and off. Are there some situations where this is better and some where it is worse? Or have we just not figured out which is more efficient yet?
This definitely needs a test. Since the default doesn't change current behavior, running the existing test suite isn't sufficient.

tcstewar · 2018-09-12T05:46:53Z

I've added a few tests -- let me know what you think @tbekolay !

Also, while writing these tests, I found issue #63 . This PR's performance is currently still affected by that bug. Bizarrely enough, I had accidentally included a workaround in this PR (in that I had specified the synapse twice, and since issue #63 was ignoring one of the synapses, that meant it ended up with the correct behaviour, but for the wrong reasons). But, for clarity, I have removed that second synapse specification from this PR, so that it has consistent behaviour.

tbekolay

Added some inline comments from a more detailed read through.

The new tests look good for testing the spiking_interneurons_on_host part of the splitter, but there are still no tests that use a different value for the snip_io_steps argument to the simulator, which is the main point of this PR. It is still not clear to me whether the behavior of the network should be unchanged (i.e., setting snip_io_steps > 1 should yield the same results in sim.data, which would be an easy thing to write tests for) or if it will lag behind (in which case, the test should make it clear how it lags behind -- i.e., if you have a node with output t do you get, essentially, t[::snip_io_steps]?)

tbekolay · 2018-09-24T17:49:06Z

nengo_loihi/snips/nengo_io.c.template

    if (inChannel == -1 || outChannel == -1) {
        printf("Got an invalid channel ID\n");
        return;
    }

-    if (s->time % 100 == 0) {
+    if (s->time == 1) {


Feels weird to be doing initialization stuff in the mgmt phase. Can this be done during the initProcess phase instead? This is what is done in the NxSDK snips.py example.

Last time I tried that (which was a few nxsdk releases ago), I ran into some troubles getting that working. I haven't tried it recently, though, but I'd be tempted to leave that for a separate PR.

tbekolay · 2018-09-24T19:06:18Z

nengo_loihi/snips/nengo_io.c.template

        printf("time %d\n", s->time);
    }

-    readChannel(inChannel, count, 1);
+    //readChannel(inChannel, count, 1);


Are these commented out lines still relevant or can we delete them?

Definitely deleteable. I'll do that.

tbekolay · 2018-09-24T19:16:51Z

nengo_loihi/snips/nengo_io.c.template

+        //printf("%d   value:%d  accum:%d\n");
+        pos_accum[i] += (value[i] + (1<<15));
+        if (pos_accum[i] >= (1<<16)) {
+            uint8_t core = s->userData[IDX_SPIKE_TARGETS+(i*3)];


Can you move all variable declarations to the beginning? I think it's a pretty old school C thing to do, but it's what I'm used to.

:) Sounds fair. :)

Yeah, I think that's back from the days before compilers were very good at optimizing things. Personally, I think readability is the most important factor. It can also help with debugging to have your variables only declared in the scope you want to use them in, so you don't accidentally use them out of scope (one of the reasons old-school C programs were not fun to debug).

Yeah I've not done enough C stuff in a professional setting to know what the standards are nowadays, but perhaps we could adopt PEP-7 since we're using PEP-8.

PEP-7 sounds good to me.... that's pretty close to what we've already been doing, too, other than cuddled else statements and the brace that starts a function....

tbekolay · 2018-09-24T19:20:50Z

sandbox/snips/learn-multi.py

@@ -0,0 +1,40 @@
+import nengo


Is this file supposed to be in here or included accidentally?

Oops, sorry -- that was an accident.

tcstewar · 2018-09-25T14:54:44Z

but there are still no tests that use a different value for the snip_io_steps argument to the simulator, which is the main point of this PR.

Ah, good point. I'll see what I can put together.

It is still not clear to me whether the behavior of the network should be unchanged (i.e., setting snip_io_steps > 1 should yield the same results in sim.data, which would be an easy thing to write tests for) or if it will lag behind (in which case, the test should make it clear how it lags behind -- i.e., if you have a node with output t do you get, essentially, t[::snip_io_steps]?)

It won't be the same data -- it'll be more like t[::snip_io_steps]. Which gives me a good idea for a test -- I'll try a Node with lambda t: 1 if timestep%snip_io_steps==0 else 0..... :)

tbekolay · 2018-09-25T15:11:13Z

lambda t: 1 if timestep%snip_io_steps==0 else 0

What is that supposed to output? I'm sure you know, but I can't quite guess by looking at that function. With something like lambda t: t I can at least look at the plot and see the stair-steps (I assume). I think the part that makes it clear is the statement of what the output should be; e.g., for lambda t: t, my understanding is that it should be np.repeat(t[::snip_io_steps], snip_io_steps), so I'm keen to know what it is if it isn't that.

tcstewar · 2018-09-25T15:34:08Z

I think the part that makes it clear is the statement of what the output should be; e.g., for lambda t: t, my understanding is that it should be np.repeat(t[::snip_io_steps], snip_io_steps), so I'm keen to know what it is if it isn't that.

Yes, that's right. So for the lambda I posted, that should give the same result as a nengo.Node(1) (i.e. all the 0 values are being skipped over). It's a pretty pathological case, but should make for a very clear test.

tcstewar · 2018-09-25T15:36:49Z

I can at least look at the plot and see the stair-steps (I assume).

Yes, that'd be stair-steps -- but I just didn't want to write code in the test to check for stair-steps. :)

tbekolay · 2018-09-25T16:00:55Z

So for the lambda I posted, that should give the same result as a nengo.Node(1) (i.e. all the 0 values are being skipped over).

Right yeah, I think that's what's a bit concerning to me, that there are other bugs that I can see resulting in a node output of all ones.

I just didn't want to write code in the test to check for stair-steps. :)

I think it's exactly np.repeat(t[::snip_io_steps], snip_io_steps), possibly t[1::snip_io_steps]

tcstewar · 2018-09-25T16:19:30Z

I think it's exactly np.repeat(t[::snip_io_steps], snip_io_steps)

But that value will be fed into an on-chip spike-generator, then into neurons, then decoded, and it'd be hard to algorithmically look at the output from that decode and see if it's doing a stair-case. Although I suppose we could have the test run twice (once with snip_io_steps=N, and once with snip_io_steps=1 but manually changing the input node to do the repeat thing).... hmmm... I'll try that too....

tcstewar · 2018-09-25T17:19:29Z

Hmm, a complication I'm having with the ramp test is that snip_io_steps also affect the output from the chip. So that means we're subsampling the output, and this is currently happening before the synapse on the output, which makes things pretty noisy. Moving that synapse on-chip is not straightforward, and I'd prefer that to be a separate PR. For posterity, here's the test I've been playing with for this:

@pytest.mark.skipif(pytest.config.getoption("--target") != "loihi",
                    reason="snips only exist on loihi")
@pytest.mark.parametrize("snip_io_steps", [10])
def test_snip_skipping_ramp(Simulator, seed, plt, snip_io_steps):
    dt = 0.001

    filt = nengo.synapses.Lowpass(0.02)
    # run a ramp with snip_io_steps
    with nengo.Network(seed=seed) as model1:
        a = nengo.Ensemble(200, 1)
        stim = nengo.Node(lambda t: t)
        nengo.Connection(stim, a, synapse=None)
        p1 = nengo.Probe(a, synapse=filt)

    with Simulator(model1, dt=dt, precompute=False,
                   snip_io_steps=snip_io_steps) as sim1:
        sim1.run(1.0)
    data1 = sim1.data[p1]

    # run the same inputs with snip_io_steps=1
    with nengo.Network(seed=seed) as model2:
        a = nengo.Ensemble(200, 1)
        def stim_func(t):
            step = int(t/dt)
            prev_step = ((step - 1) // snip_io_steps) * snip_io_steps
            if prev_step < 0:
                prev_step = 0
            return prev_step * dt

        stim = nengo.Node(stim_func)
        nengo.Connection(stim, a, synapse=None)
        p2 = nengo.Probe(a, synapse=None)

    with Simulator(model2, dt=dt, precompute=False,
                   snip_io_steps=1) as sim2:
        sim2.run(1.0)
    data2 = filt.filt(sim2.data[p2])

    # convert the output to what would happen if the output
    # were read every snip_io_steps time steps
    data3 = np.repeat(sim2.data[p2][::snip_io_steps], snip_io_steps, axis=0)
    data3 = filt.filt(data3)

    print(np.hstack([data1, data2, data3]))

tcstewar · 2018-09-25T17:28:16Z

I've added the 0/1 test, which also revealed a bug (on-chip probing was broken)

tbekolay · 2018-09-25T17:30:03Z

on-chip probing was broken

If this is because of the TODOs in the ProbeDict, then I'm currently working on that fix -- if not then carry on!

tcstewar · 2018-09-25T17:33:58Z

If this is because of the TODOs in the ProbeDict, then I'm currently working on that fix -- if not then carry on!

Nope, it was a silly bug on my part forgetting that we need to repeat the data we get from the chip... It's now fixed: 688f8c0

tbekolay · 2018-09-25T19:48:42Z

Is this ready for re-review? Please make sure to re-assign the PR to me when you're ready for a re-review @tcstewar.

tbekolay and others added 10 commits August 19, 2018 16:30

Fix n_steps in shape demo

cfe42bc

The input stream has shape `(n_steps, input_dimensions)`, so before we were always running for `input_dimensions` (390) timesteps.

Remove second confusing VTH_MAX

d14606b

And add a docstring for loihi_api.VthProfile.

Add MNIST pseudo-conv net example

ea5482b

Co-authored-by: Eric Hunsberger <[email protected]>

Match learning rates to Nengo

da7be19

For both the emulator and hardware. Note that these are set to match for the learn_communication_channel notebook, and won't necessarily match in other places, but since we render this notebook in the docs it's important that it matches Nengo here.

Moved spike generator into snips

8c23740

Added snip_io_steps to control communication rate

e77a8dc

Only move spike generators to be on-chip for loihi

03a6d42

PEP8 fixes

a9eff59

Flake8 fixes

ab549fe

This was referenced Aug 24, 2018

Better stopping behaviour for snips #27

Closed

Nengo GUI integration #28

Closed

tbekolay added the ready for review label Sep 4, 2018

hunse self-requested a review September 4, 2018 20:35

hunse self-assigned this Sep 4, 2018

fixup! Added snip_io_steps to control communication rate

d4e0a15

hunse reviewed Sep 5, 2018

View reviewed changes

hunse assigned tcstewar and unassigned hunse Sep 5, 2018

Terry Stewart added 2 commits September 10, 2018 11:57

Added code comments for clarity

d89dfa3

Changed default snip_io_steps to 1 (from 10)

b598de6

This is to avoid surprising users who expect the default nengo behaviour of messages passing all the time

tcstewar removed their assignment Sep 10, 2018

hunse reviewed Sep 10, 2018

View reviewed changes

fixup! Added code comments for clarity

a2cac8a

hunse approved these changes Sep 10, 2018

View reviewed changes

hunse assigned tbekolay Sep 10, 2018

tbekolay reviewed Sep 11, 2018

View reviewed changes

tbekolay requested changes Sep 11, 2018

View reviewed changes

tbekolay assigned tcstewar Sep 11, 2018

Terry Stewart and others added 4 commits September 11, 2018 22:32

fixup! Moved spike generator into snips

68303cc

Added test for interneurons on or off host

5380800

Removed repetition of input synapse

a71e353

fixup! Removed repetition of input synapse

e64de2e

tcstewar mentioned this pull request Sep 12, 2018

Input synapses are currently ignored #63

Closed

tcstewar assigned tbekolay and unassigned tcstewar Sep 12, 2018

tbekolay requested changes Sep 24, 2018

View reviewed changes

tbekolay assigned tcstewar and unassigned tbekolay Sep 25, 2018

fixup! Removed repetition of input synapse

688f8c0

tcstewar mentioned this pull request Sep 26, 2018

Multidimensional learning #89

Merged

1 task

tbekolay force-pushed the master branch from 701f997 to 1fe237c Compare November 11, 2018 14:31

tbekolay removed the ready for review label May 22, 2019

arvoelke mentioned this pull request Oct 17, 2019

Buffer spikes when reading in snip #254

Open

hunse mentioned this pull request Nov 11, 2019

Faster communication with host/chip #256

Draft

3 tasks

tbekolay marked this pull request as draft December 13, 2021 21:22

Speeding up snips by communicating less often #26

Are you sure you want to change the base?

Speeding up snips by communicating less often #26

Conversation

tcstewar commented Aug 24, 2018 • edited Loading

hunse left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcstewar commented Sep 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hunse commented Sep 10, 2018

Choose a reason for hiding this comment

tbekolay left a comment

Choose a reason for hiding this comment

tcstewar commented Sep 12, 2018

tbekolay left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcstewar commented Sep 25, 2018

tbekolay commented Sep 25, 2018

tcstewar commented Sep 25, 2018

tcstewar commented Sep 25, 2018

tbekolay commented Sep 25, 2018 • edited Loading

tcstewar commented Sep 25, 2018 • edited Loading

tcstewar commented Sep 25, 2018

tcstewar commented Sep 25, 2018

tbekolay commented Sep 25, 2018

tcstewar commented Sep 25, 2018

tbekolay commented Sep 25, 2018

tcstewar commented Aug 24, 2018 •

edited

Loading

hunse left a comment •

edited

Loading

tbekolay left a comment •

edited

Loading

tbekolay commented Sep 25, 2018 •

edited

Loading

tcstewar commented Sep 25, 2018 •

edited

Loading