Checking the performance of classifiers in high dimensional noise setting. #1

sahanasrihari · 2019-11-15T04:21:28Z

Description

The sklearn's example on comparing the different classifier accuracies does not have multiple settings for testing various scenarios.
There is no concrete example showing when some of these algorithms win and when they lose.
One scenario to consider is - given a dataset of a relatively low order dimension, how does the accuracy of classifiers change with respect to the addition of noise dimensions.

Noise dimensions are any features added across the dimensions of the dataset which bears no relevance to the original signal dimensions.

Goal
To check the performance of Random Forest, Support Vector Machine and K Nearest Neighbours as three different classifiers for the additions of gaussian noise across three different variance values.

Proposed changes in the form of PR
I am proposing a new tutorial in the form of a jupyter notebook containing all the code from data generation to the computation of accuracies across noise dimensions.
The final figure will contain a plot of the original datasets adopted from https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
and 9 different plots of "Accuracy Vs Number of Noise Dimensions" for the 3 different datasets and 3 different variances of gaussian noise. The plot will containing the testing accuracies across 50 trials of the experiment.

Here is a link to the code:
https://github.com/NeuroDataDesign/team-forbidden-forest/blob/master/Sahana/FINAL_PR_classifiers.ipynb

bdpedigo · 2019-11-21T15:30:02Z

This issue is unclear about what you are actually proposing to PR into sklearn. You can be a lot more detailed about the fact that you are proposing a new tutorial, what the figures will be, what the data is

sahanasrihari · 2019-12-07T04:30:31Z

Description

The sklearn's example on comparing the different classifier accuracies does not have multiple settings for testing various scenarios.
There is no concrete example showing when some of these algorithms win and when they lose.
One scenario to consider is - given a dataset of a relatively low order dimension, how does the accuracy of classifiers change with respect to the addition of noise dimensions.

Noise dimensions are any features added across the dimensions of the dataset which is bears no relevance to the original signal dimensions.

Goal
To check the performance of Random Forest, Support Vector Machine and K Nearest Neighbours as three different classifiers for the additions of gaussian noise across three different variance values.

Proposed changes in the form of PR
I am proposing a new tutorial in the form of a jupyter notebook containing all the code from data generation to the computation of accuracies across noise dimensions.
The final figure will contain a plot of the original datasets adopted from https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
and 9 different plots of "Accuracy Vs Number of Noise Dimensions" for the 3 different datasets and 3 different variances of gaussian noise. The plot will containing the testing accuracies across 50 trials of the experiment.

Code: https://github.com/sahanasrihari/scikit-learn/blob/master/examples/classification/CLASSIFIER_COMPARISON_PR.ipynb

This issue is unclear about what you are actually proposing to PR into sklearn. You can be a lot more detailed about the fact that you are proposing a new tutorial, what the figures will be, what the data is

@bdpedigo I have made changes to the issue, please let me know if there is need of adding more detail.

sahanasrihari mentioned this issue Dec 11, 2019

Classifier comparison with noise dimensions NeuroDataDesign/scikit-learn#17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking the performance of classifiers in high dimensional noise setting. #1

Checking the performance of classifiers in high dimensional noise setting. #1

sahanasrihari commented Nov 15, 2019 •

edited

Loading

bdpedigo commented Nov 21, 2019

sahanasrihari commented Dec 7, 2019 •

edited

Loading

Checking the performance of classifiers in high dimensional noise setting. #1

Checking the performance of classifiers in high dimensional noise setting. #1

Comments

sahanasrihari commented Nov 15, 2019 • edited Loading

Description

bdpedigo commented Nov 21, 2019

sahanasrihari commented Dec 7, 2019 • edited Loading

Description

sahanasrihari commented Nov 15, 2019 •

edited

Loading

sahanasrihari commented Dec 7, 2019 •

edited

Loading