Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clusters can be split in phase 2 (after reassignment) #12

Open
tjd2002 opened this issue Feb 24, 2019 · 1 comment
Open

Clusters can be split in phase 2 (after reassignment) #12

tjd2002 opened this issue Feb 24, 2019 · 1 comment

Comments

@tjd2002
Copy link
Contributor

tjd2002 commented Feb 24, 2019

The reassignment in phase 1 uses evidence from a first round of clustering to attempt to assign events to the channel where, if they were clustered correctly, their 'true' cluster's centroid would have its peak. Inevitably, this reassignment is imperfect, and we think it can result in a narrow class of 'split clusters'.

Consider the case of 2 partially overlapping clusters, K1 & K2, whose templates (average waveform, a.k.a. centroid) have peak amplitudes on different channels, C1 & C2, respectively. In phase 1, Isosplit will split these clusters along some hyperplane, and assign all events to their corresponding channel (irrespective of the peak amplitude of the individual event). Inevitably, there will be some error (hopefully small) in the reassignment. For example. some events that are truly from cluster K2 will end up (either by assignment or reassignment) on channel C1. So far so good.

If we were to repeat the clustering, on the same events, then we should end up making the same error, and the situation would be stable (i.e. the erroneous K2 events should get clustered in with K1 again).

However, the second round of isosplit in phase 2 proceeds on only the events assigned (or reassigned) to the central channel. Since this is done with a new set of input events in a new PC space, it is plausible that some of the erroneously assigned events will get separated out into a new cluster. In the case of contamination of K1 by K2, then we could end up with two very similar clusters each containing some of the spikes for the true K2: the 'main' cluster on C2, and a cluster of (probably very few) 'orphan' spikes clustered on C1.

During curation, this would look like K2 had been split, with one cluster containing the great majority of the spikes. This is something our users report seeing under MS4.

This is currently just a hypothesis for the splitting. I plan to address it by adding in a check at the end of phase 2, to see if any of the resulting clusters have their template peak on a channel other than the central channel of the neighborhood. We could also better diagnose the operation of MS4 if we saved the 'home' neighborhood for each cluster when combining clusters after phase 2: this is requested in a separate issue #11 )

cc @hrjoo

@tjd2002
Copy link
Contributor Author

tjd2002 commented Feb 24, 2019

[NB this is somewhat of a placeholder while we work to test this hypothesis with real data and additional diagnostics. @magland feel free to assign to me for now]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant