-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: High-level network specification #2050
Conversation
I think the idea of using a high level algebra to define networks is really really nice. However, I do not fully see how the PR in its current form would help with creating 'realistic' neural networks as none of the functions use things like soma-soma distances or placed synapse positions. Eg, if I generate 2000 random morphologically realistic cells with synapses randomly distributed over the axons and dendrites, I would like to connect up those within a 50um range with 10% probability. Not any random cell to any other random cell.. Again, I really like the approach, but I don't see yet who the target audience is. One of the use cases I see is generating large benchmarks, which is useful for Arbor developers looking at the scaling capacity. Or I guess generating small-world etc.. networks would also fit in the based method, making it useful for these network-science-on-neurons papers that look at the effect of topology on dynamics. Are you planning on adding cell-location dependent network selection methods or is that intrinsically impossible in arbor? |
The idea is to provide the building blocks for such cases, without having to extent arbor to use otherwise unnecessary information like a global cell position. So assuming the user has a way of generating or reading the cell position, a custom selection with the help of a uniform random network value can be created, for example:
This way, arbor takes care of necessary logic of generating repeatable random numbers for pairs of source and destination, as well as only sampling locally required connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
A few comments, I think the documentation can use a bit more love.
In terms of selection operators: maybe one or more should be built in, for instance one that takes into account distance (to soma, other synapse sites, synapse distance?). Not sure what the full list would be here, but e.g. being able to specify a maximum distance or make long distance connections less likely would be clearly useful.
Hi @AdhocMan, sorry for adding to an already long list of comments, but this topic has been near and dear to my heart for a long time. Before I start delving into the details of the code, though, some high-level comments:
NB. 3 doesn't mean your proposal is infeasible or needs more work in this direction. Just that we should be aware of it |
Thanks for the feedback. In general, this new way of specifying connections is not meant as a replacement of the current way. For simple connection setups, the current style is likely a better fit. So when comparing performance, a more fair comparison would include the work required to know which connections to return. In particular, generating random connections between all cells with a non-uniform probability is inherently a To address the case of spatial queries, I'm working on an extension of the implementation, that includes support for cell locations. It will not take cell structure into account, to avoid the complexities like cell orientation, local label resolution and locsets containing multiple points. Assuming that limiting connections to a maximum distance is a common use case, one can limit the number of cells that need to be sampled by using a spatial data structure (an octree for example). Regarding the performance of call backs to Python: Once spatial extension is implemented, I'll have another go at the documentation. |
Hi @AdhocMan, I'd like to propose a way that solves the spatial query requirement and buys us much flexibility in the longer So, here's the idea: We add a new callback on std::map<std::string, std::any> metadata_on(cell_gid_type gid); and returns freely configurable metadata. A default set can be implemented and merged with the user-supplied Now filters and operations can make use of these fields by selecting them as (metadata "location" cell) It's also extensible by the user who can the add their own callbacks and so on. Others that come to mind What do you think? |
Here's another design-level bit of feedback: I'd like to isolate the user from the actual Rationale for that is that this type of information makes recipes non-composable. Consider two populations -- possibly That these elaborate to actual sets of numbers at some point is a given, sure, but the user shouldn't handle those. |
@AdhocMan How's the feature going? If you're stuck, please let me know, so we can get you going again :) |
I don't have as much time as I was hoping for to work on this, but the spatial support is almost done. Mainly some more tests and adjustment of the documentation / examples left to do. @thorstenhater Sorry, for the late reply to your suggestions. Regarding the use of In principle, I like the idea of a meta data map, but I'm concerned about how efficient this would be in terms of performance and memory usage, since it would mean creating several strings and |
The advantage of the user-defined metadata approach is that it is infinitely extensible. Overheads are in the range
We use similar type-erased tags for probes, cell-information, and more. My strong advice is to think about at least Note that simple position is not enough, as one might have to handle individual locations on the cell (
I think this might be a noble goal that raises more issues than it solves. Currently the network is encoded in the |
Acknowledging the risk that I will the reviewer.... |
This PR has dropped off my radar unfortunately. |
@w-klijn could you give it a read? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A massive undertaking, can't wait to have it in. I went over mostly code and internals, so @w-klijn can give it a look, too.
static network_selection destination_cell(gid_range range); | ||
|
||
// Select connections that form a chain, such that source cell "i" is connected to the destination cell "i+1" | ||
static network_selection chain(std::vector<cell_gid_type> gids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For one, shifted_by(-1)
subsumes chain_reverse
. In general, it adds more power for the user at essentially zero cost to us. No harm in keeping chain
and chain_reverse
, while adding cyclic_shift
and using it to implement the other two.
python/network.cpp
Outdated
return generate_network_connections(rec_shim, ctx->context, decomp.value()); | ||
}, | ||
"recipe"_a, | ||
pybind11::arg_v("context", pybind11::none(), "Execution context"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't misuse arg_v
for doc strings. The last argument is a string representing the default value if it is not expressible using pybind. See https://pybind11.readthedocs.io/en/stable/advanced/functions.html#default-arguments-revisited
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed now
ext/pybind11
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to bump this? We usually stick to the rule of upgrading dependencies in
- separated PRs
- if you absolutely must, please bump all references to it: docs, pyproject, setup, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this was changed. It's reverted back now.
doc/concepts/interconnectivity.rst
Outdated
High Level Network Description | ||
------------------------------ | ||
|
||
As an alternative to providing a list of connections for each cell in the :ref:`recipe <modelrecipe>`, arbor supports high-level description of a cell network. It is based around a ``network_selection`` type, that represents a selection from the set of all possible connections between cells. A selection can be created based on different criteria, such as source or target label, cell indices and also distance between source and target. Selections can then be combined with other selections through set algebra like expressions. For distance calculations, the location of each connection point on the cell is resolved through the morphology combined with a cell isometry, which describes translation and rotation of the cell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing about alternative
here, from the code I read that it is additive instead. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "additional option"
arborio/networkio.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general note about the DSL design here. Currently, we are duplicating operators from other places, e.g.:
(radius-lt ...)
(morph) and(distance-lt)
(network), which could be(< (radius ...))
and(< (distance ...))
. I know that there are reasons for specialisation here, but it seems awkward and possibly harder to maintain. If specialisation is needed we could contract these particular forms to their optimised variant.(exp ...)
and(log ...)
already appear iniexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding these operators would be too much for me right now unfortunately.
The parsing of the network DSL is completely separate from the iexp parsing, so there should be no conflict for duplicate expressions.
@@ -47,17 +48,18 @@ class ARB_ARBOR_API domain_decomposition { | |||
int gid_domain(cell_gid_type gid) const; | |||
int num_domains() const; | |||
int domain_id() const; | |||
cell_size_type index_on_domain(cell_gid_type gid) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly expose std::pair<int, cell_size_type> gid_domain(cell_gid_type)
. This might allow skipping some duplicated calls to the inner function. Second, how about adding a proper type for the returned value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a breaking API change. Internally, we never use them together, so there would be no direct benefit.
But if you prefer, I can merge these functions.
arbor/util/spatial_tree.hpp
Outdated
if (leaf_d.empty()) return; | ||
|
||
min_.fill(std::numeric_limits<double>::max()); | ||
max_.fill(-std::numeric_limits<double>::max()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_.fill(-std::numeric_limits<double>::max()); | |
max_.fill(std::numeric_limits<double>::min()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The min()
value is actually the smallest value close to 0. I made the same mistake initially ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never knew. That would have tripped me up bad. Yet -max
might not be representable.
For floating-point types with denormalization, min() returns the minimum positive normalized value. Note that this behavior may be unexpected, especially when compared to the behavior of min() for integral types. To find the value that has no values less than it, use lowest().(since C++11)
lowest
seems to be the right pick here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, lowest
seems to be the best fit. I've changed it now.
arbor/util/spatial_tree.hpp
Outdated
|
||
spatial_tree &operator=(const spatial_tree &) = default; | ||
|
||
spatial_tree &operator=(spatial_tree &&t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether std::swap
ping with t
might be better. At least it avoids issues with exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using std::swap
now for the container type now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After another look at it, I think using std::swap doesn't really at a benefit here, since we end up using move assignment anyway. It should be noexcept however.
I just came across this PR in my inbox, looks great! I tried to port the brunel network to it but ran in some small/minor issues, maybe this is useful:
Error reporting is a bit sparse in general (what is
for what I thought would be a fine expression:
So the condition gets reported as
which was a bit counterintuitve. Maybe an if-else for network selections would be nice |
Thanks @thorstenhater and @llandsmeer for the review.
There is no answer field for this to respond somehow. The @llandsmeer I've fixed the issues you mentioned and improved the error message. It should now include the correct type names. The addition of return types in the documentation would be inconsistent to how we have documented these expressions so far. Since they are grouped by return type though, I think it's not quite necessary. The example you showed is quite specific, since the distinction between the two selections is just a parameter. The DSL for selections is designed more generally. So one would usually use operations like
|
@@ -32,6 +32,19 @@ T constexpr area_circle(T r) { | |||
return pi<T> * square(r); | |||
} | |||
|
|||
template <typename T, typename U, typename = std::enable_if_t<std::is_unsigned_v<U>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion here:
/// Exponentiation for integral exponents. Uses exponentiation by squaring
template <typename T, typename I, typename = std::enable_if_t<std::is_integral_v<I>>>
T constexpr pow(T base, I exp) {
// 0^0 and 0^-n are undefined
if (base == 0 && exp <= 0) return NAN;
// trivial cases
if (base == 0 || base == 1) return base;
// negative exp
if (exp < 0) {
exp = -exp;
base = 1/base;
}
// log2(exp)
T acc = 1;
for (exp > 0; exp >> 1) {
if (exp & 1) {
// NOTE: Could get cute here and eliminate the branch.
acc *= base;
exp -= 1;
}
base *= base;
}
return acc;
}
- it's iterative instead of recursive
- allows more exponents
- Simpler test for odd numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this version in general. But what if T is an integer type? Returning NAN would be a problem.
#include <arbor/network.hpp> | ||
|
||
#include <Random123/threefry.h> | ||
#include <Random123/boxmuller.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our use of R123 has grown and I have an open issue to use it in schedule
, too. (#2243)
Thus, I'll use that opportunity to consolidate and make some thin abstractions.
Based from the discussion I know how complex the challenge is you have solved here. When reading the example code it is extremely impressive that you managed to get this in such a terse and succinct description. The change introduces a complete new language with a complex grammar. The provided example with API does not do the work justice. I would strongly suggest to add a howto and more detail explanation how to use it. But, after merging as this functionality as it is is valuable and should not be lost. Non blocking remarks:
a. Would it be possible to add a smallest trivial example? Connect two neurons together with this language? b. I would suggest to explain what is expected in the return values. (return A.network_description(s, w, d, {})) Observation |
@w-klijn Thanks for your feedback! One important inclusion at the moment are intra-cell connections, so connections with the source and target on the same cell. To avoid generating such connections, one has to always include the |
Something that I couldn't find in the documentation/api which would be really useful, is obtaining the generated connectivity. How does that happen (if possible)? |
You can generate network connections and inspect them by using the function Any opinions on keeping the generation of intra-cell connections or excluding them to avoid having them accidentally generated? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AdhocMan,
thanks for undertaking massive addition to Arbor. There's some smaller kinks
we'll work out over time, starting with a tutorial, but before we get into scope
creep territory let's merge this.
Thanks once again 🎉
🎉 |
This PR implements a high-level network specification as proposed in #418. It does not include support for gap junctions to allow the use of domain decomposition for some distributed network generation.
The general idea is a DSL based on set algebra, which operates on the set of all possible connections, by selecting based on different criteria, such as the distance between cells or lists of labels. By operating on all possible connections, a separate definition of cell populations becomes unnecessary. An example for selecting all inter-cell connections with a certain source and destination label is:
(intersect (inter-cell) (source-label \"detector\") (destination-label \"syn\"))
For parameters such as weight and delay, a value can be defined in the DSL in a similar way with the usual mathematical operations available. An example would be:
(max 0.1 (exp (mul -0.5 (distance))))
The position of each connection site is calculated by resolving the local position on the cell and applying an isometry, which is provided by a new optional function of the recipe. In contrast to the usage of policies to select a member within a locset, each site is treated individually and can be distinguished by its position.
Internally, some steps have been implemented in an attempt to reduce the overhead of generating connections:
Custom selection and value functions can still be provided by storing the wrapped function in a dictionary with an associated label, which can then be used in the DSL.
Some challenges remain. In particular, how to handle combined explicit connections returned by
connections_on
and the new way to describe a network. Also, the use of non-blocking MPI is not easily integrated into the current context types, and the dry-run context is not supported so far.Example
A (trimmed) example in Python, where a ring connection combined with random connections based on the distance:
TODO