-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up snips by communicating less often #26
base: main
Are you sure you want to change the base?
Conversation
The overall structure of the documentation is copied from other Nengo projects. Much of the content is adapted from the docs currently in the wiki, with slight changes to make it more generic to any Loihi setup and not necessarily ours. The overview and API docs are new. Some classes that made sense to include in the API docs had sparse docstrings, so those were added where appropriate. Examples were moved to the docs folder for easier inclusion and fewer folders in the root directory.
The input stream has shape `(n_steps, input_dimensions)`, so before we were always running for `input_dimensions` (390) timesteps.
And add a docstring for loihi_api.VthProfile.
Co-authored-by: Eric Hunsberger <[email protected]>
For both the emulator and hardware. Note that these are set to match for the learn_communication_channel notebook, and won't necessarily match in other places, but since we render this notebook in the docs it's important that it matches Nengo here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this looks good. I've just made a few comments asking for comments/explanation at certain points in the code. I've also added a few comments myself in additional commits. (Be sure to pop out the "show outdated" comments. I went through commit-by-commit, which is why they've appeared like that, but I think they're all still applicable.)
nengo_loihi/loihi_interface.py
Outdated
for inp, cx_ids in core.iterate_inputs(): | ||
axon_ids = inp.axon_ids[0] | ||
assert len(axon_ids) % 2 == 0 | ||
n_inputs += len(axon_ids) //2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why divide by two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because we have both positive and negative channels? We should note that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and good idea -- I've added a comment
axon_ids = inp.axon_ids[0] | ||
half = len(axon_ids)//2 | ||
for i in range(len(axon_ids)//2): | ||
assert axon_ids[i][0] == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the chip ID, right? Maybe put a "chip_id must be zero" message with the assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added :)
nengo_loihi/simulator.py
Outdated
assert axon_ids[i][0] == 0 | ||
assert axon_ids[i][1] == axon_ids[half+i][1] | ||
spike_targets.extend((axon_ids[i][1], axon_ids[i][2], | ||
axon_ids[half+i][2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment to explain what each of these values are would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good idea. That's pretty obscure what it's doing... comment added.
readChannel(inChannel, spike, 3); | ||
s->userData[IDX_SPIKE_TARGETS+(i*3)+0] = spike[0]; | ||
s->userData[IDX_SPIKE_TARGETS+(i*3)+1] = spike[1]; | ||
s->userData[IDX_SPIKE_TARGETS+(i*3)+2] = spike[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'd have comments at the end of each line saying what that value is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for (int i=0; i < N_INPUTS; i++) { | ||
readChannel(inChannel, spike, 1); | ||
//printf("stim value %d.%d\n", i, spike[0]); | ||
value[i] = spike[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a binary value indicating whether there's a spike in that channel at the current time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code below, I now believe it to be a discretized version of the real value that we're communicating (i.e. the decoded value). A note about this would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
nengo_loihi/splitter.py
Outdated
encoders=np.vstack([np.eye(dim), -np.eye(dim)]), | ||
max_rates=[max_rate] * dim + [max_rate] * dim, | ||
intercepts=[-1] * dim + [-1] * dim) | ||
# TODO: move the above check elsewhere |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I wrote this check, but why can't the collective rate be > 1000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, this looks weird now since you can't actually see the check (unless you go commit-by-commit). The check was that max_rate <= 1000
, where max_rate = inter_rate * inter_n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I just kept that check from before, and it doesn't seem right to me. It also seems like something that should be looked into in #55 . I've added a TODO here and made a note about it in that PR
readChannel(inChannel, spike, 1); | ||
//printf("stim value %d.%d\n", i, spike[0]); | ||
value[i] = spike[0]; | ||
if (s->time % IO_STEPS == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better for this to happen when s->time % IO_STEPS == 1
? I believe that time starts at 1. That way, we'll read a value right after initialization, and won't waste the first few time steps with no input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point -- I've moved the check. :)
{% endfor %} | ||
|
||
writeChannel(outChannel, output, N_OUTPUTS); | ||
writeChannel(outChannel, output, N_OUTPUTS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is N_OUTPUTS
sometimes zero? If so, is there any overhead associated with this? Do we want a check so that we only call this if we have outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a use case for N_OUTPUTS==0, but it is supported, so I've added an if
statement. :)
import nxsdk | ||
target = 'loihi' | ||
except ImportError: | ||
target = 'sim' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to what's being done here, but it might be nice to have some sort of output (print statements? logger statements?) that indicate which target has been chosen. I would probably do it even in the case that the user has explicitly specified target
(so after this if
block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, although I'm not sure what the best sort of message would be. I'd definitely leave it for a separate PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure we already have these logging statements, they're just not raised here.
This is to avoid surprising users who expect the default nengo behaviour of messages passing all the time
Thank you for the comments, @hunse ! I've added a bunch of comments based on your suggestions. I also made one change: I changed the default for |
// Note that we do this at the *start* of an IO_STEP period (since | ||
// s->time starts at 1), so that we don't have a period at the beginning | ||
// of the simulation where we are ignoring the input. | ||
if (s->time % IO_STEPS == 1 % IO_STEPS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as (IO_STEPS==1) || (s->time % IO_STEPS == 1)
? Currently we do it one way here and another way down below. I'd keep it consistent. Also, as elegant as this way is, I think I prefer the explicit OR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't even notice I'd solved that puzzle before.... :) Yes, those are meant to be the same, and I agree that the explicit OR is clearer than the elegant mod-comparison thing. :) I've switched it to the explicit approach.
It all looks good, except for the one little inline comment I added. I also think defaulting the number of steps to 1 is a good idea. |
target = 'loihi' | ||
except ImportError: | ||
target = 'sim' | ||
|
||
if target == 'simreal': | ||
logger.info("Using real-valued simulator") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked through briefly, and I have a few issues before I delve in in more detail.
- I'm not sure about the overhead of it, but it feels like collecting timing information (tracking
self.time_per_step
) shouldn't always be enabled, but be something you can turn on and off. spiking_interneurons_on_host
, on the other hand, feels like a weird thing to be able to turn on and off. Are there some situations where this is better and some where it is worse? Or have we just not figured out which is more efficient yet?- This definitely needs a test. Since the default doesn't change current behavior, running the existing test suite isn't sufficient.
I've added a few tests -- let me know what you think @tbekolay ! Also, while writing these tests, I found issue #63 . This PR's performance is currently still affected by that bug. Bizarrely enough, I had accidentally included a workaround in this PR (in that I had specified the synapse twice, and since issue #63 was ignoring one of the synapses, that meant it ended up with the correct behaviour, but for the wrong reasons). But, for clarity, I have removed that second synapse specification from this PR, so that it has consistent behaviour. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some inline comments from a more detailed read through.
The new tests look good for testing the spiking_interneurons_on_host
part of the splitter, but there are still no tests that use a different value for the snip_io_steps
argument to the simulator, which is the main point of this PR. It is still not clear to me whether the behavior of the network should be unchanged (i.e., setting snip_io_steps
> 1 should yield the same results in sim.data
, which would be an easy thing to write tests for) or if it will lag behind (in which case, the test should make it clear how it lags behind -- i.e., if you have a node with output t
do you get, essentially, t[::snip_io_steps]
?)
if (inChannel == -1 || outChannel == -1) { | ||
printf("Got an invalid channel ID\n"); | ||
return; | ||
} | ||
|
||
if (s->time % 100 == 0) { | ||
if (s->time == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels weird to be doing initialization stuff in the mgmt
phase. Can this be done during the initProcess
phase instead? This is what is done in the NxSDK snips.py
example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last time I tried that (which was a few nxsdk releases ago), I ran into some troubles getting that working. I haven't tried it recently, though, but I'd be tempted to leave that for a separate PR.
printf("time %d\n", s->time); | ||
} | ||
|
||
readChannel(inChannel, count, 1); | ||
//readChannel(inChannel, count, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these commented out lines still relevant or can we delete them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely deleteable. I'll do that.
//printf("%d value:%d accum:%d\n"); | ||
pos_accum[i] += (value[i] + (1<<15)); | ||
if (pos_accum[i] >= (1<<16)) { | ||
uint8_t core = s->userData[IDX_SPIKE_TARGETS+(i*3)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move all variable declarations to the beginning? I think it's a pretty old school C thing to do, but it's what I'm used to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) Sounds fair. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's back from the days before compilers were very good at optimizing things. Personally, I think readability is the most important factor. It can also help with debugging to have your variables only declared in the scope you want to use them in, so you don't accidentally use them out of scope (one of the reasons old-school C programs were not fun to debug).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've not done enough C stuff in a professional setting to know what the standards are nowadays, but perhaps we could adopt PEP-7 since we're using PEP-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP-7 sounds good to me.... that's pretty close to what we've already been doing, too, other than cuddled else statements and the brace that starts a function....
@@ -0,0 +1,40 @@ | |||
import nengo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file supposed to be in here or included accidentally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry -- that was an accident.
Ah, good point. I'll see what I can put together.
It won't be the same data -- it'll be more like |
What is that supposed to output? I'm sure you know, but I can't quite guess by looking at that function. With something like |
Yes, that's right. So for the lambda I posted, that should give the same result as a |
Yes, that'd be stair-steps -- but I just didn't want to write code in the test to check for stair-steps. :) |
Right yeah, I think that's what's a bit concerning to me, that there are other bugs that I can see resulting in a node output of all ones.
I think it's exactly |
But that value will be fed into an on-chip spike-generator, then into neurons, then decoded, and it'd be hard to algorithmically look at the output from that decode and see if it's doing a stair-case. Although I suppose we could have the test run twice (once with snip_io_steps=N, and once with snip_io_steps=1 but manually changing the input node to do the repeat thing).... hmmm... I'll try that too.... |
Hmm, a complication I'm having with the ramp test is that @pytest.mark.skipif(pytest.config.getoption("--target") != "loihi",
reason="snips only exist on loihi")
@pytest.mark.parametrize("snip_io_steps", [10])
def test_snip_skipping_ramp(Simulator, seed, plt, snip_io_steps):
dt = 0.001
filt = nengo.synapses.Lowpass(0.02)
# run a ramp with snip_io_steps
with nengo.Network(seed=seed) as model1:
a = nengo.Ensemble(200, 1)
stim = nengo.Node(lambda t: t)
nengo.Connection(stim, a, synapse=None)
p1 = nengo.Probe(a, synapse=filt)
with Simulator(model1, dt=dt, precompute=False,
snip_io_steps=snip_io_steps) as sim1:
sim1.run(1.0)
data1 = sim1.data[p1]
# run the same inputs with snip_io_steps=1
with nengo.Network(seed=seed) as model2:
a = nengo.Ensemble(200, 1)
def stim_func(t):
step = int(t/dt)
prev_step = ((step - 1) // snip_io_steps) * snip_io_steps
if prev_step < 0:
prev_step = 0
return prev_step * dt
stim = nengo.Node(stim_func)
nengo.Connection(stim, a, synapse=None)
p2 = nengo.Probe(a, synapse=None)
with Simulator(model2, dt=dt, precompute=False,
snip_io_steps=1) as sim2:
sim2.run(1.0)
data2 = filt.filt(sim2.data[p2])
# convert the output to what would happen if the output
# were read every snip_io_steps time steps
data3 = np.repeat(sim2.data[p2][::snip_io_steps], snip_io_steps, axis=0)
data3 = filt.filt(data3)
print(np.hstack([data1, data2, data3])) |
I've added the 0/1 test, which also revealed a bug (on-chip probing was broken) |
If this is because of the TODOs in the |
Nope, it was a silly bug on my part forgetting that we need to repeat the data we get from the chip... It's now fixed: 688f8c0 |
Is this ready for re-review? Please make sure to re-assign the PR to me when you're ready for a re-review @tcstewar. |
This changes the snip I/O system so that we don't communicate every timestep. This adds a
snip_io_steps parameter
to Simulator to let you control how often (in units of steps) we should communicate. The reason for this is to drastically speed up the simulation! If this is set to 1 (which is the same as what it used to be), then we're at around 8ms per time step. If it's set to 10 (which is the new default I just created), then we get ~1.5ms per time step! Yay faster loihi!One big change that was used to implement this is that we are no longer sending spikes from host to chip. Instead, we send decoded values, and those values are then turned into spikes on-chip. Right now, this is done in the snip itself on the x86 processor, just by implementing a bunch of little accumulators. However, in a future PR we could also reorganize this so that we actually use a Loihi core to do this, since simulating spiking neurons is what it's for. In that case, whenever we receive a value from the host, we'd set the bias of a Loihi Cx. This would mean that we wouldn't be doing anything in the snip itself on time steps where we're not communicating.
TODO