Alter `nn_isolation` to use existing principal components #3342

chrishalcrow · 2024-08-27T15:15:44Z

Updated the nn_isolation metric to use the already-calculated PCs.

The previous implementation of the nn_isolation metric does the following:

Take two units, A and B, and their spikes and waveforms
Using the waveforms of just the spikes from units A and B, run a principal component analysis
Compute the isolation score based on the PCs of this analysis

This is prohibitively slow; meaning that this quality metric is currently removed from the default quality_metric list due to its speed.

Instead, this PR implements the following:

Take the PCs calculated by the compute_principal_components, which includes all spikes
Compute the isolation score based on the PCs of this analysis

I think the new implementation is consistent with the references describing the quality metrics (https://pubmed.ncbi.nlm.nih.gov/28910621
and https://pubmed.ncbi.nlm.nih.gov/33473216/
). Please correct me if you disagree!

It’s also:

Much faster (x150 on single core), since we don’t need to redo the PCA
Fits into the parallelisation scheme used by the other pca metrics => more than 150 faster!
Uses the sparsity scheme already used by the other PC metrics, rather than a custom one => easier to maintain.

The isolation score are generally worse in the new implementation, because the PCA is applied to all spikes, not just those in the two clusters being compared.

Also updated docstrings and docs to (hopefully) clarify what the metric is calculating.

Benchmark code. Note: num_units is the most important parameter since there is a max_spikes limit (so a long duration doesn’t affect things) and the method uses sparsity (so num_channels doesn’t affect things)

import spikeinterface.full as si
import numpy as np
from time import perf_counter

all_times = {}

for num_units in [10,20,30,40]:
    
    recording, sorting = si.generate_ground_truth_recording(durations=[10], num_channels=10, num_units=num_units)
    sorting_analyzer = si.create_sorting_analyzer(recording=recording, sorting=sorting)
    
    sorting_analyzer.compute(["random_spikes", "noise_levels", "waveforms", "templates", "principal_components", "spike_locations", "spike_amplitudes"])
    
    times = []
    for _ in range(3):
        t_start = perf_counter() 
        sorting_analyzer.compute({"quality_metrics": {"metric_names": ["nn_isolation"]}})
        t_end = perf_counter() 
        times.append(t_end - t_start)
    
    time = np.median(times)
    
    all_times[num_units] =  time

Old times:

all_times
>>> {10: 33.851581208989955, 20: 141.90579045796767, 30: 380.2453615410486, 40: 620.6475296249846}

New times:

all_times
>>> {10: 0.2444504169980064, 20: 0.968019749969244, 30: 2.3160873339511454, 40: 4.065173499984667}

zm711 · 2024-08-29T12:26:31Z

I think the only real issue is that the values are now different. So from my perspective it would make the most sense to wait until your other PR is merged allowing for computation of individual metrics without redoing all quality metrics so that if a user needs to go back and change.

Would we want a warning for at least a version or two saying implementation has changed so nn is not directly comparable with versions < (0.101.x)?

chrishalcrow · 2024-08-29T13:04:39Z

I think the only real issue is that the values are now different. So from my perspective it would make the most sense to wait until your other PR is merged allowing for computation of individual metrics without redoing all quality metrics so that if a user needs to go back and change.

Yeah, that makes sense. Note: a bug (fixed in #3138) meant nn_isolation was impossible to use in 0.101.0. It has also been excluded from the default list of quality metrics for a long time because it took so long to run. So I'd expect there to be very few cases where someone had actually calculated it and is using it. Would love to hear if people have calculated it though!

Would we want a warning for at least a version or two saying implementation has changed so nn is not directly comparable with versions < (0.101.x)?

Could be a good idea. Also would be good to make a quality metrics policy in the future. Do we include lots of implementations, or just "the best" implementation? A good example is isi_violations: https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics/isi_violations.html I think the Llobet approximation has been shown to be more accurate and is the same speed as the UMS2000 one. Should we keep both metrics, or make a policy to decide on which to keep? And if we find errors in some quality metrics and fix it, do we warn about the change for a version?

We could be generous and allow for everything. Then I could call this metric nn_isolation_from_pcs and keep both implementations. We do something like this with the silhouette score: we maintain a precise slow version and a fast approximation of it https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics/silhouette_score.html.

zm711 · 2024-08-29T13:25:35Z

I wrote the silhouette score code and added in the approximation (from Hrushka) because my computer didn't have enough RAM to do the original implementation. Haha.

I think my issue is that if someone (maybe not for nn but for other metrics) was using some cutoff and then we change the implementation and now their values are all lower now their cutoff for analyzing their data no longer works and they have to redo the process of figuring out what a new cutoff is. But yeah I'm not sure the best solution whether it be a warning or something. So I think this is a good topic of discussion for the meeting!

chrishalcrow added 4 commits August 23, 2024 10:45

Update nn_isolation metric

2f45fc2

Merge branch 'SpikeInterface:main' into update-nn-isolation

fe021e1

Update docstrings and docs

b7206ac

Merge branch 'main' into update-nn-isolation

72f034c

alejoe91 added the qualitymetrics Related to qualitymetrics module label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alter `nn_isolation` to use existing principal components #3342

Alter `nn_isolation` to use existing principal components #3342

chrishalcrow commented Aug 27, 2024

zm711 commented Aug 29, 2024

chrishalcrow commented Aug 29, 2024

zm711 commented Aug 29, 2024

Alter nn_isolation to use existing principal components #3342

Are you sure you want to change the base?

Alter nn_isolation to use existing principal components #3342

Conversation

chrishalcrow commented Aug 27, 2024

zm711 commented Aug 29, 2024

chrishalcrow commented Aug 29, 2024

zm711 commented Aug 29, 2024

Alter `nn_isolation` to use existing principal components #3342

Alter `nn_isolation` to use existing principal components #3342