Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alter nn_isolation to use existing principal components #3342

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chrishalcrow
Copy link
Collaborator

Updated the nn_isolation metric to use the already-calculated PCs.

The previous implementation of the nn_isolation metric does the following:

  1. Take two units, A and B, and their spikes and waveforms
  2. Using the waveforms of just the spikes from units A and B, run a principal component analysis
  3. Compute the isolation score based on the PCs of this analysis

This is prohibitively slow; meaning that this quality metric is currently removed from the default quality_metric list due to its speed.

Instead, this PR implements the following:

  1. Take the PCs calculated by the compute_principal_components, which includes all spikes
  2. Compute the isolation score based on the PCs of this analysis

I think the new implementation is consistent with the references describing the quality metrics (https://pubmed.ncbi.nlm.nih.gov/28910621
and https://pubmed.ncbi.nlm.nih.gov/33473216/
). Please correct me if you disagree!

It’s also:

  • Much faster (x150 on single core), since we don’t need to redo the PCA
  • Fits into the parallelisation scheme used by the other pca metrics => more than 150 faster!
  • Uses the sparsity scheme already used by the other PC metrics, rather than a custom one => easier to maintain.

The isolation score are generally worse in the new implementation, because the PCA is applied to all spikes, not just those in the two clusters being compared.

Also updated docstrings and docs to (hopefully) clarify what the metric is calculating.

Benchmark code. Note: num_units is the most important parameter since there is a max_spikes limit (so a long duration doesn’t affect things) and the method uses sparsity (so num_channels doesn’t affect things)

import spikeinterface.full as si
import numpy as np
from time import perf_counter

all_times = {}

for num_units in [10,20,30,40]:
    
    recording, sorting = si.generate_ground_truth_recording(durations=[10], num_channels=10, num_units=num_units)
    sorting_analyzer = si.create_sorting_analyzer(recording=recording, sorting=sorting)
    
    sorting_analyzer.compute(["random_spikes", "noise_levels", "waveforms", "templates", "principal_components", "spike_locations", "spike_amplitudes"])
    
    times = []
    for _ in range(3):
        t_start = perf_counter() 
        sorting_analyzer.compute({"quality_metrics": {"metric_names": ["nn_isolation"]}})
        t_end = perf_counter() 
        times.append(t_end - t_start)
    
    time = np.median(times)
    
    all_times[num_units] =  time

Old times:

all_times
>>> {10: 33.851581208989955, 20: 141.90579045796767, 30: 380.2453615410486, 40: 620.6475296249846}

New times:

all_times
>>> {10: 0.2444504169980064, 20: 0.968019749969244, 30: 2.3160873339511454, 40: 4.065173499984667}

@alejoe91 alejoe91 added the qualitymetrics Related to qualitymetrics module label Aug 28, 2024
@zm711
Copy link
Collaborator

zm711 commented Aug 29, 2024

I think the only real issue is that the values are now different. So from my perspective it would make the most sense to wait until your other PR is merged allowing for computation of individual metrics without redoing all quality metrics so that if a user needs to go back and change.

Would we want a warning for at least a version or two saying implementation has changed so nn is not directly comparable with versions < (0.101.x)?

@chrishalcrow
Copy link
Collaborator Author

I think the only real issue is that the values are now different. So from my perspective it would make the most sense to wait until your other PR is merged allowing for computation of individual metrics without redoing all quality metrics so that if a user needs to go back and change.

Yeah, that makes sense. Note: a bug (fixed in #3138) meant nn_isolation was impossible to use in 0.101.0. It has also been excluded from the default list of quality metrics for a long time because it took so long to run. So I'd expect there to be very few cases where someone had actually calculated it and is using it. Would love to hear if people have calculated it though!

Would we want a warning for at least a version or two saying implementation has changed so nn is not directly comparable with versions < (0.101.x)?

Could be a good idea. Also would be good to make a quality metrics policy in the future. Do we include lots of implementations, or just "the best" implementation? A good example is isi_violations: https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics/isi_violations.html I think the Llobet approximation has been shown to be more accurate and is the same speed as the UMS2000 one. Should we keep both metrics, or make a policy to decide on which to keep? And if we find errors in some quality metrics and fix it, do we warn about the change for a version?

We could be generous and allow for everything. Then I could call this metric nn_isolation_from_pcs and keep both implementations. We do something like this with the silhouette score: we maintain a precise slow version and a fast approximation of it https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics/silhouette_score.html.

@zm711
Copy link
Collaborator

zm711 commented Aug 29, 2024

I wrote the silhouette score code and added in the approximation (from Hrushka) because my computer didn't have enough RAM to do the original implementation. Haha.

I think my issue is that if someone (maybe not for nn but for other metrics) was using some cutoff and then we change the implementation and now their values are all lower now their cutoff for analyzing their data no longer works and they have to redo the process of figuring out what a new cutoff is. But yeah I'm not sure the best solution whether it be a warning or something. So I think this is a good topic of discussion for the meeting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qualitymetrics Related to qualitymetrics module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants