Streamline sampling. #1929

brenthuisman · 2022-08-01T11:41:25Z

Make the shape of record outputs (as returned by simulation.samples()) consistent between probesets and vector (or _cell kinded) probes. See probe id -> probeset id, and clarification in docs #1898 (comment)
Have option to set probe recording policy upon declaration of the probe.
a. The moment sim.sample() is called could be seen as an implicit setting of recording policy, because why else would one set a schedule and store a handle?
Have way to label the probe, such that a user can discover the location of the recorder (instead or in addition to (gid,lid)) or retrievable by a custom label.
a. And/Or have way to translate cell_member and probe_id/probe_address and handle (could be the label in step 2. not the tag that's already there.)
Have a tutorial showing this off.

The text was updated successfully, but these errors were encountered:

thorstenhater · 2022-08-04T13:12:11Z

Is Python only, right? But I like this! Less complication to deal with.
Seems orthogonal, spikes and samples have been separate so far. Better split this into a separate issue.
Ditto, and the location is set by the user? Also we already label them. Or am I misunderstanding something?
Not sure I understand/

brenthuisman · 2022-08-04T14:02:29Z

Point 1 and 2 were phrased incorrectly, I think spike detector was inserted by someone else using my hands. The recording policy of any probe should be settable upon defining the probe(set).

Point 2 and 3 are different solution to the same problem, so I made that clearer. The problem is:

class my_rcep(arbor.recipe):
    def probes(self, gid):
        return [
            arbor.cable_probe_membrane_voltage('"midpoints"'),
            arbor.cable_probe_membrane_voltage('(root)'),
            arbor.cable_probe_membrane_voltage_cell(),
            etc...
        ]

handle0 = sim.sample((0, ?), arbor.some_schedule(...))
handle1 = sim.sample((0, ??), arbor.some_schedule(...))
handle2 = sim.sample((0, ???), arbor.some_schedule(...))
handle3 = sim.sample((0, thisnumberdoesntevenhavebeanexistingindex=9999), arbor.some_schedule(...))

Better is:

class my_rcep(arbor.recipe):
    def probes(self, gid):
        return [
            arbor.cable_probe_membrane_voltage("recorder_1", '"midpoint"'),
        ]

handle0 = sim.sample("recorder_1", arbor.some_schedule(...))

or

my_rcord = arbor.cable_probe_membrane_voltage('"midpoint"')

class my_rcep(arbor.recipe):
    def probes(self, gid):
        return [
            myrcord,
        ]

sim.samples(my_rcord)

thorstenhater · 2022-08-04T14:12:49Z

I like the idea of having vector probes to be equivalent to multi-location scalar probes. No idea though if this is realisable in C++ efficiently.

Definitely pro labels as well, it ties in nicely with our remaining place-like methods. Should be something like [(probe, label)] though.

brenthuisman · 2022-08-04T14:17:01Z

It occurs to me now that sim.sample() is maybe the most logical place to set a default sampling policy, perhaps even by calling that the policy should be implicitly set; why else create a handle and set a schedule?

brenthuisman · 2023-05-11T08:31:42Z

What's the reason for separating probes from samplers in the context of being able to set samplers to not record? Is there still an impact on memory/performance if all probes would have a sampler attached by default? If not, then we could reduce cognitive overhead by not having to have to label (or index) them separately.

We could add one extra arg (a schedule) to probe constructors such that their sampler is defined.
Reading them out after the simulation could be through a new probe method (e.g. ProbeClass.samples()). To extract data, users could just loop over sim.probes() and call the method, and filter on the metadata it returns if they need to.
Switching samplers off could be done through policies or a new user callable probe method.

What do you think, @boeschf?

thorstenhater · 2023-05-11T09:12:42Z

I was not part of that particular design, but it has some merits:

you can treat the recipe as opaque and select subsets of the probes to actually sample. That might help during model construction or when doing detailed analyses, eg peaking into STATE
we can separate the what+where from the timepoints we evaluate the values on.
also, different processes (=simulation objects) in the distributed case might do different things

I agree though that it was built with the poweruser in mind and solves for the 10% case.
But keep in mind that in the C++ API you never manifest these arrays as in Python, you supply a callback
that's being executed on the values. The Python API uses this to build up arrays.
How about we layer some convenience on top? Example adapted from single_cell_recipe.py
Before

# ...
    def probes(self, gid):
        # just to have two ...
        return [arbor.cable_probe_membrane_voltage('"midpoint"'), arbor.cable_probe_membrane_voltage('"midpoint"')]
# ...

handles = [sim.sample((0, 0), arbor.regular_schedule(0.1)), sim.sample((0, 1), arbor.regular_schedule(0.1))]
sim.run(tfinal=30)

data, meta = sim.samples(handles[0])[0]
# ...
data, meta = sim.samples(handles[1])[0]

After

# ...
    def probes(self, gid):
        return [arbor.cable_probe_membrane_voltage('"midpoint"'),  arbor.cable_probe_membrane_voltage('"midpoint"')]
# ...


handles = sim.sample_all(arbor.regular_schedule(0.1)) 
sim.run(tfinal=30)

for data, meta in sim.samples(handles):
  # ...

From an API standpoint this is only marginally different from setting a default schedule. I prefer this slightly for being consistent with the existing API and worded actively, ie 'do that' vs 'here, have an object'.

One more layer of abstraction could be futures, where the handle is the data as soon as it has been filled by Arbor
and empty before.

brenthuisman · 2023-05-11T09:49:40Z

This helper can be used in the meantime, to reduce the problem to one index to keep track of:

def simpler_sample(self,rec,sched):
    if getattr(rec, "probes"):
        smplrs=[]
        for gid in range(rec.num_cells()):
            smplrs_on_gid=[]
            for pi in range(len(rec.probes(gid))):
                smplrs_on_gid.append(self.sample((gid,pi),sched))
            smplrs.append(smplrs_on_gid)
        return smplrs

thorstenhater · 2023-05-12T07:10:22Z

But this works, no? You hand it the recipe explicitly.

brenthuisman · 2023-05-12T07:33:25Z

This works, yes, but it would be better if I could probe sim for the rec, which is not possibly atm. But, @boeschf wrote he's working on sth already, so I'll just keep this snippet around in case anyone asks. You still need to bookkeep an index after all.

thorstenhater · 2023-05-12T07:54:44Z

Ok, I was just a bit confused. So, yes, this is cumbersome and slow as in $\mathcal{O}(N_\mathrm{cell})$, too.

thorstenhater · 2024-03-05T10:20:20Z

My current stance is: We can simplify the sampling interface (for now, cable cell only) by

Data

probe data will either be
- double :: scalar probe
- [] std::pair<const double*, const double*> :: vector probe (an iterator/range)
these can be unified into the second case

Meta

mlocation :: = [(branch id, pos)] for scalar probes
mcable_list :: = [mcable] = [(branch id, prox, dist)] for vector probes
cable_probe_point_info :: for point mechanism state
[cable_probe_point_info] :: for cell-wide point mechanism state queries.
and

struct cable_probe_point_info {
    // Target number of point process instance on cell.
    cell_lid_type target;

    // Number of combined instances at this site.
    unsigned multiplicity;

    // Point on cell morphology where instance is placed.
    mlocation loc;
};

Unification:

we can merge cases 1 and 2 by replacing mlocation with mcable(id, pos, pos)
as well as 3 and 4 by always using a vector

we also can merge 1+2 and 3+4 by using this

struct cable_probe_point_info {
  // Target number of point process instance on cell.
  cell_lid_type target;
  // Number of combined instances at this site.
  unsigned multiplicity;
};

struct cable_probe_metadata {
  // Point on cell morphology where instance is placed.
  mlocation loc;
  // If a point query, then this is set
  std::optional<cable_probe_point_info> point_info;
};

this also removes a series of cases from the any and maybe we can 'just' use a variant

thorstenhater · 2024-03-05T10:22:56Z

In Python we can then return a NumPy array of the proper shape (#samples, #points) and a list of metadata [info]

halfflat · 2024-03-05T12:28:49Z

I'm not really convinced that we need to modify the C++ API. It's flexible enough to support any number of variations in the Python API, and was designed to support future probeable things that can have arbitrary return types. This, in fact, has already demonstrated its utility in the extended probe return-types that were added to the cable cell API.

It's still the case that we want to be able to support arbitrary novel cell types in the future? If so, let's not specialize the interface for cable cells.

halfflat · 2024-03-05T12:41:35Z

Just to expand a bit, the idea with probe return values is that they constitute a contract between the cell type and the sampler: the sampler expects to get the type of data that is provided by cell groups that handle the cell type for that sort of probe address, and the sampler can confirm that it has the contract correct by casting the any_ptr in the sample record and the any_ptr in the returned probe metadata.

This allows us to support arbitrary information returned by probes on cells in the future that we haven't yet considered. It also allows us to handle specific cell types and probe addresses in the Python API in a safe manner.

thorstenhater · 2024-03-05T14:34:20Z

We 'just' added LIF probes in #2021, which has a new, disjoint set of probe ctors and data/metadata. Much less interesting,
which is why this discussion here deals in cable cells only. I am fine with switching implementations across cell types.

My gripe is with these fellows:

void vector_sampler(arb::probe_metadata pm, std::size_t n, const arb::sample_record* samples) {
    auto* cables_ptr = any_cast<const arb::mcable_list*>(pm.meta);
    auto* point_info_ptr = any_cast<const std::vector<arb::cable_probe_point_info>*>(pm.meta);

    assert((cables_ptr != nullptr) || (point_info_ptr != nullptr));

    unsigned n_entities = cables_ptr ? cables_ptr->size() : point_info_ptr->size();

    std::cout << std::fixed << std::setprecision(4);
    for (std::size_t i = 0; i<n; ++i) {
        auto* value_range = any_cast<const arb::cable_sample_range*>(samples[i].data);
        assert(value_range);
        const auto& [lo, hi] = *value_range;
        assert(n_entities==hi-lo);

        for (unsigned j = 0; j<n_entities; ++j) {
            std::cout << samples[i].time << ", ";
            if (cables_ptr) {
                arb::mcable where = (*cables_ptr)[j];
                std::cout << where.prox_pos << ", " << where.dist_pos << ", ";
            } else {
                arb::mlocation loc = (*point_info_ptr)[j].loc;
                std::cout << loc.pos << ", ";
            }
            std::cout << lo[j] << '\n';
        }
    }
}

We know --- hence the asserts --- a few things about the data shape beforehand. Using the model I suggested above,
we could just leverage that directly. Furthermore this function is actually two, depending on whether cables_ptr or
point_info_ptr, but we cannot know that without running the function. Worse we write another two functions for
the scalar case. Imagine regularly switching probe types. Thus

Treating vector and scalar probes always as vector removes half the burden on the implementer
Making meta = (mcable, optional<point_info>) allows one sampler for all cases

I agree, though, on the extendability argument; adding a probe type within a cell type without conforming to the suggested interface would be painful. Also eyeing the idea of cell groups as a plugin. Nesting anys could help...

thorstenhater · 2024-03-13T15:43:24Z

These problems even bubble up to the Python interface:

for (gid, tag), handle in handles.items():
    for data, meta in sim.samples(handle):
        columns = []
        if isinstance(meta, list):
            columns += list(map(str, meta))
        else:
            columns.append(str(meta))
        df = pd.DataFrame(data=data[:, 1:], columns=columns, index=data[:, 0]))
        # do something w/ df

at least, this could be:

for (gid, tag), handle in handles.items():
    for data, meta in sim.samples(handle):
        columns = list(map(str, meta))
        df = pd.DataFrame(data=data[:, 1:], columns=columns, index=data[:, 0]))
        # ...

and I'd even prefer:

for (gid, tag), handle in handles.items():
    data, meta = sim.samples(handle)
    columns = list(map(str, meta))
    df = pd.DataFrame(data=data[:, 1:], columns=columns, index=data[:, 0]))
    # ...

brenthuisman added enhancement help wanted Feature Request neuroscience labels Aug 1, 2022

brenthuisman self-assigned this Aug 1, 2022

brenthuisman mentioned this issue Aug 1, 2022

probe id -> probeset id, and clarification in docs #1898

Merged

brenthuisman added this to the v0.8 milestone Aug 2, 2022

thorstenhater mentioned this issue Aug 5, 2022

add spike source docs. #1939

Merged

thorstenhater changed the title ~~Spike detector label and/or recording default.~~ Streamline sampling. Aug 5, 2022

thorstenhater mentioned this issue Aug 5, 2022

Add metadata to spike recording #1941

Closed

thorstenhater pinned this issue Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline sampling. #1929

Streamline sampling. #1929

brenthuisman commented Aug 1, 2022 •

edited

Loading

thorstenhater commented Aug 4, 2022 •

edited

Loading

brenthuisman commented Aug 4, 2022

thorstenhater commented Aug 4, 2022

brenthuisman commented Aug 4, 2022

brenthuisman commented May 11, 2023

thorstenhater commented May 11, 2023

brenthuisman commented May 11, 2023 •

edited

Loading

thorstenhater commented May 12, 2023

brenthuisman commented May 12, 2023

thorstenhater commented May 12, 2023

thorstenhater commented Mar 5, 2024

thorstenhater commented Mar 5, 2024

halfflat commented Mar 5, 2024 •

edited

Loading

halfflat commented Mar 5, 2024

thorstenhater commented Mar 5, 2024

thorstenhater commented Mar 13, 2024

Streamline sampling. #1929

Streamline sampling. #1929

Comments

brenthuisman commented Aug 1, 2022 • edited Loading

thorstenhater commented Aug 4, 2022 • edited Loading

brenthuisman commented Aug 4, 2022

thorstenhater commented Aug 4, 2022

brenthuisman commented Aug 4, 2022

brenthuisman commented May 11, 2023

thorstenhater commented May 11, 2023

brenthuisman commented May 11, 2023 • edited Loading

thorstenhater commented May 12, 2023

brenthuisman commented May 12, 2023

thorstenhater commented May 12, 2023

thorstenhater commented Mar 5, 2024

Data

Meta

thorstenhater commented Mar 5, 2024

halfflat commented Mar 5, 2024 • edited Loading

halfflat commented Mar 5, 2024

thorstenhater commented Mar 5, 2024

thorstenhater commented Mar 13, 2024

brenthuisman commented Aug 1, 2022 •

edited

Loading

thorstenhater commented Aug 4, 2022 •

edited

Loading

brenthuisman commented May 11, 2023 •

edited

Loading

halfflat commented Mar 5, 2024 •

edited

Loading