You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using "RandomRBFGeneratorEvents" to clustering the data I realized that when the stream has noise in it, the calculation of Purity, for example, is wrong. It happens because in MembershipMatrix, the "classmap" doens't contain the key "-1" that maps the noise label to the last "workcluster" index, instead of that, the noise label key is the number of clusters and it could be mapped to any "workcluster".
The line 52 of F1 measure is useless because "mm.hasNoiseClass()" always return false and the number of classes will be the same.
For example, a cluster has 2 instances of a real class and 5 noise instances
The current implementation would calculate that group purity is the value (5/7), because the noise index it's not ignored in "mm.getClusterClassWeight()" during the "for loop". Furthermore this also happens when the group contains only noise instances, wich is completely equivocaded.
The text was updated successfully, but these errors were encountered:
gusnunes
changed the title
Data with noise label
Data with noise class
Sep 20, 2022
Using "RandomRBFGeneratorEvents" to clustering the data I realized that when the stream has noise in it, the calculation of Purity, for example, is wrong. It happens because in MembershipMatrix, the "classmap" doens't contain the key "-1" that maps the noise label to the last "workcluster" index, instead of that, the noise label key is the number of clusters and it could be mapped to any "workcluster".
The line 52 of F1 measure is useless because "mm.hasNoiseClass()" always return false and the number of classes will be the same.
For example, a cluster has 2 instances of a real class and 5 noise instances
The current implementation would calculate that group purity is the value (5/7), because the noise index it's not ignored in "mm.getClusterClassWeight()" during the "for loop". Furthermore this also happens when the group contains only noise instances, wich is completely equivocaded.
The text was updated successfully, but these errors were encountered: